Generating HTML from XML tree (C # /. NET)
I have an HTML document stored in memory as a tree of Linq-to-XML objects. How can I serialize XDocument as HTML, taking into account the specifics of HTML?
For example, empty tags such as <br/>
should be serialized as <br>
, whereas empty tags <div/>
should be serialized as <div></div>
.
HTML output is possible from an XSLT stylesheet, and XmlWriterSettings
has a property OutputMethod
that can be set to HTML, but the installer internal
is for XSLT or Visual Studio and I can't seem to find a way to serialize arbitrary XML as HTML.
So, if you don't use XSLT solely to be able to render HTML (i.e. do something like run a document through the nonsensical XDocument-> XmlReader-> chain via XSLT, in HTML), is there a way to serialize a.NET XDocument to HTML?
source to share
No . XDocument-> XmlReader-> XSLT is the approach you need.
What you're looking for is a specialized serializer that lets you add tag values ββto names like br
and div
and does them differently. You can also expect such a serializer to work both ways, IOW will be able to read the HTML Tag soup and generate an XDocument. Such a thing doesn't exist out of the box.
The XmlReader for XSLT seems simple enough to define, it is ultimately just a chain of streams.
source to share
Like you, I am very surprised that the HTML output method is not showing, and I am not aware of this in any way other than the XSLT route you already identified. When I ran into the same problem a couple of years ago, I wrote an XmlWriter wrapper class that would force WriteEndElement to use the WriteFullEndElement in the base XmlWriter if the processed tag was not in the list {"area", "base", "basefont", "bgsound "," br "," col "," embed "," frame "," hr "," isindex "," image "," img "," input "," link ", meta", "param", " spacer "," wbr "}.
This fixed the <div /> problem and was sufficient for me as I wanted to write polyglot documents. I haven't found a way to make it appear like <br> but other than the fact that it can't be validated like HTML 4.01, it doesn't cause a real problem. My guess is that if you really need this and don't want to use an XSLT method, you will have to write your own XmlWriter implementation.
source to share
Of course have!
//XDocument document; string filename;
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
typeof(XmlWriterSettings).GetField("outputMethod", BindingFlags.NonPublic|BindingFlags.Instance).SetValue(settings, XmlOutputMethod.Html);
using(XmlWriter xw = XmlWriter.Create(filename, settings))
{
document.Save(xw);
}
source to share