When using MergeField FieldCodes in the OpenXML SDK in C #, why do field codes disappear or become fragmented?

I've been working on the C # OpenXml SDK for a long time (Unofficial Microsoft Package 2.5 from NuGet), but recently noticed that the following line of code returns different results depending on what mood Microsoft Word gets when the file is saved:

var fields = document.Descendants<FieldCode>();

      

From what I can tell, when creating a document in the first place (using Word 2013 on Windows 8.1) if you use the Insert-> QuickParts-> field and select MergeField from the left margins pane and then specify the field name in the properties fields and click "OK", then the field code will be correctly saved in the document, as you would expect.

Then, using the above line of code, I will get a field box with field 1 of the field. If I later edit this document (and even leave this field alone), subsequent saving may mean that this field code is no longer returned in my request.

Another case of the same curiosity is when I see FieldCode nodes split across multiple elements. So instead of seeing, say:

" MERGEFIELD  Author  \\* MERGEFORMAT "

      

As a node name I will see:

" MERGEFIELD  Aut"
"hor  \\* MERGEFORMAT"

      

Halve the FieldCode node values. I have no idea why this is the case, but it certainly makes my ability to match nodes much more captivating. Is this the expected behavior? Known bug? I really don't want to hack into the original XML file and edit this document until I understand what's going on. Thank you all very much.

+3


source to share


2 answers


Word often splits text runs into multiple text runs for no reason I've ever understood. When searching, comparing, ordering, etc. We are processing the body with a method that combines multiple runs into one text run.



    /// <summary>
    /// Combines the identical runs.
    /// </summary>
    /// <param name="body">The body.</param>
    public static void CombineIdenticalRuns(W.Body body)
    {

        List<W.Run> runsToRemove = new List<W.Run>();

        foreach (W.Paragraph para in body.Descendants<W.Paragraph>())
        {
            List<W.Run> runs = para.Elements<W.Run>().ToList();
            for (int i = runs.Count - 2; i >= 0; i--)
            {
                W.Text text1 = runs[i].GetFirstChild<W.Text>();
                W.Text text2 = runs[i + 1].GetFirstChild<W.Text>();
                if (text1 != null && text2 != null)
                {
                    string rPr1 = "";
                    string rPr2 = "";
                    if (runs[i].RunProperties != null) rPr1 = runs[i].RunProperties.OuterXml;
                    if (runs[i + 1].RunProperties != null) rPr2 = runs[i + 1].RunProperties.OuterXml;
                    if (rPr1 == rPr2)
                    {
                        text1.Text += text2.Text;
                        runsToRemove.Add(runs[i + 1]);
                    }
                }
            }
        }
        foreach (W.Run run in runsToRemove)
        {
            run.Remove();
        }
    }

      

+1


source


I ran into this problem myself and found a solution that exists in OpenXML: the MarkupSimplifier utility class, which is part of the PowerTools for Open XML project. All problems I faced were solved with this class.

The full article is here.

Here are some local exercepts:

Perhaps the most useful simplification it accomplishes is to combine contiguous runs with the same formatting.

It goes on to say:

Open XML applications, including Word, can split runs as needed. If, for example, you added a comment to a document, the runs will be separated at the start and end of the comment. After the MarkupSimplifier removes the comments, it can concatenate runs, resulting in simpler markup.

An example of a utility class used is:



SimplifyMarkupSettings settings = new SimplifyMarkupSettings
{
    RemoveComments = true,
    RemoveContentControls = true,
    RemoveEndAndFootNotes = true,
    RemoveFieldCodes = false,
    RemoveLastRenderedPageBreak = true,
    RemovePermissions = true,
    RemoveProof = true,
    RemoveRsidInfo = true,
    RemoveSmartTags = true,
    RemoveSoftHyphens = true,
    ReplaceTabsWithSpaces = true,
};
MarkupSimplifier.SimplifyMarkup(wordDoc, settings);

      

I've used this many times with Word 2010 documents using VS2015.Net Framework 4.5.2 and it made my life a lot easier.

Update

I revisited this code and found that it clears up when MERGEFIELDS is executed, but not IF FIELDS which refers to mergefields, for example.

{if {MERGEFIELD When39} = "Y???" "Y" "N" }

      

I don't know why this might be the case, and looking at the underlying XML offers no clues.

+1


source







All Articles