", String.Empty)...">

How to remove html tags from word content?

I know there are several threads that say simply using

Regex.Replace(input, "<.*?>", String.Empty);

      

but i cant use it in text written in word doc. my code is like:

Microsoft.Office.Interop.Word.Document wBelge = oWord.Documents.Add(ref oMissing,
    ref oMissing, ref oMissing, ref oMissing);
Microsoft.Office.Interop.Word.Paragraph paragraf2;
paragraf2 = wBelge.Paragraphs.Add(ref oMissing);
paragraf2.Range.Text ="some long text";

      

I can change using search and replace, like

Word.Find findObject = oWord.Selection.Find;
findObject.ClearFormatting();
findObject.Text = "<strong>";
findObject.Replacement.Text = "";
findObject.Replacement.ClearFormatting();               

object replaceAllc = Word.WdReplace.wdReplaceAll;
findObject.Execute(ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,
    ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,
    ref replaceAllc, ref oMissing, ref oMissing, ref oMissing, ref oMissing);

      

Do I need to do this for every html tag?

0


source to share


2 answers


With some help provided in the comments I figured out the following working solution

findObject.ClearFormatting();
findObject.Text = @"\<*\>";
findObject.MatchWildcards=true;                     
findObject.Replacement.ClearFormatting();
findObject.Replacement.Text = "";                       

object replaceAll = Word.WdReplace.wdReplaceAll;
findObject.Execute(ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,
    ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,
    ref replaceAll, ref oMissing, ref oMissing, ref oMissing, ref oMissing);

      



which uses a search pattern \<*\>

(containing a wildcard *

, hence findObject.MatchWildcards must be set to true).

0


source


Try the following:

Convert your text with HTML substrings to a simple string using



string unFormatted = paragrapf2.ToString(SaveOptions.DisableFormatting));

      

and then replace the paragraf2 parameter with the unFormatted string.

0


source







All Articles