Variable output as CDATA XML with XMLWriter

I am trying to make a web service in PHP for an application that can be contacted and will receive data from a database and put it in XML format for the application. However, one of the columns contains HTML and should be output (I think) as CDATA. I'm having trouble with that. Please inform

<?php
mysql_connect(DB_HOST, DB_USER, DB_PASSWORD);
mysql_select_db(DB_NAME);

$sql = "SELECT post_date_gmt, post_content, post_title FROM [schema].wp_posts WHERE post_status = \"publish\" && post_type = \"post\" ORDER BY post_date_gmt DESC;";
$res = mysql_query($sql);

$xml = new XMLWriter();

$xml->openURI("php://output");
$xml->startDocument();
$xml->setIndent(true);

$xml->startElement('BlogPosts');

while ($row = mysql_fetch_assoc($res)) {

    $xml->startElement("Post");

    $xml->startElement("PostDate");
    $xml->writeRaw($row['post_date_gmt']);
    $xml->endElement();

    $xml->startElement("PostTitle");
    $xml->$writeRaw($row['post_title']);
    $xml->endElement();

    $xml->startCData("PostContent");
    $xml->writeCData($row['post_content']);
    $xml->endCData();

    $xml->endElement();

}

$xml->endElement();

header('Content-type: text/xml');
$xml->flush();

?>

      

Thank you for any help you could offer!

+3


source to share


3 answers


Do not use XMLWriter::writeRaw()

unless you really want to write XML fragments directly. "Raw" means there will be no way out of the library.

Correct way to write text to XML document XMLWriter::text()

.

$xml->startElement('PostTitle');
$xml->text('foo & bar');
$xml->endElement();

      

Output:

<?xml version="1.0"?>
<PostTitle>foo &amp; bar</PostTitle>

      

If you use XMLWriter::writeRaw()

in this example, the result will contain unescaped &

and be invalid XML.

CDATA sections are character nodes, not like text nodes, but allowing special characters without escaping or preserving spaces. You should always create a node element separately. A node element can contain multiple other nodes, even multiple CDATA sections.

The XmlReader has two ways to create CDATA sections:



One method:

$xml->startElement("PostContent");
$xml->writeCData('<b>post</b> content');
$xml->endElement();

      

Output:

<?xml version="1.0"?>
<PostContent><![CDATA[<b>post</b> content]]></PostContent>

      

Or start / end methods:

$xml->startElement("PostContent");
$xml->startCData();
$xml->text('<b>post</b> content');
$xml->text(' more content');
$xml->endCData();
$xml->endElement();

      

Output:

<?xml version="1.0"?>
<PostContent><![CDATA[<b>post</b> content more content]]></PostContent>

      

+5


source


You can simply add it to the elements you need to wrap with CDATA, for example:



 $xml->writeRaw('<![CDATA['.$row['post_date_gmt'].']]>');

      

0


source


The answer from ThW is generally brooding and the way to go. It explains well how the interface is supposed to be used XMLWriter

in PHP.

Credits go to him also for most of the work done for this differentiated answer as we discussed the question in the chat yesterday.

There are some limitations with CDATA to XML, but this also applies to the two described two ways to use XMLWriter for CDATA:

The string ']]>' cannot be placed in a CDATA section, so nested CDATA sections are not allowed (validity constraint).

From: CDATA Section - Compare 2.7 CDATA Sections

Typically XMLWriter accepts string data that is not encoded for use. For example. if you pass in some text, it will be written correctly encoded (unless specified XMLWriter::writeRaw

).

But if you start a CDATA section and then write text or write CDATA directly, the string passed in must not end or contain another CDATA section. This means that it cannot contain the character sequence " ]]>

", as this would prematurely end the CDATA section.

Thus, it is the responsibility of the user of these methods to pass valid data to the XMLWriter.

This is usually trivial (one-octets, binary character based encodings based on US-ASCII and Unicode UTF-8), here is a sample code:

/**
 * prepare text for CDATA section to prevent invalid or nested CDATA
 *
 * @param $string
 *
 * @return string
 * @link http://www.w3.org/TR/REC-xml/#sec-cdata-sect
 */
function xmlwriter_prepare_cdata_text($string) {
    return str_replace(']]>', ']]]]><![CDATA[>', (string) $string);
}

      

And an example of use:

$xml = new XMLWriter();
$xml->openURI("php://output");
$xml->startDocument();

$xml->startElement("PostContent");
$xml->writeCDATA(xmlwriter_prepare_cdata_text('<![CDATA[Foo & Bar]]>'));
$xml->endElement();

$xml->endElement();

      

Approximate output:

<?xml version="1.0"?>
<PostContent><![CDATA[<![CDATA[Foo & Bar]]]]><![CDATA[>]]></PostContent>

      

DOMDocument . does something very similar under the hood:

$dom = new DOMDocument();
$dom->appendChild(
    $dom->createElement('PostContent')
);
$dom->documentElement->appendChild(
    $dom->createCdataSection('<![CDATA[Foo & Bar]]>')
);
$dom->save("php://output");

      

Output:

<?xml version="1.0"?>
<PostContent><![CDATA[<![CDATA[Foo & Bar]]]]><![CDATA[>]]></PostContent>

      

To technically understand why PHP's XMLWriter behaves this way, you need to know that XMLWriter is based on the libxml2 library . The PHP extension for most of the work done passes calls through libxml:

PHP xmlwriter_write_cdata

delegates to libxml xmlTextWriterWriteCDATA

, which does the intended sequence xmlTextWriterStartCDATA

, xmlTextWriterWriteString

and xmlTextWriterEndCDATA

.

xmlTextWriterWriteString

used in many routines (for example, for writing PI), but only for some cases of writing text is the content parameter string encoded:

  • Name,
  • Text and
  • Attribute.

For everyone else, it went as it is. This includes CDATA, so the data passed XMLWriter::writeCData

must meet the requirements for XML-CData (because it's written with this method):

  • [20] CData ::= (Char* - (Char* ']]>' Char*))

Which technically says: Any string that doesn't contain " ]]>

".

This can be easily controlled, I myself suspected it might have been a mistake yesterday. And I'm not the only one related bug report on PHP.net: https://bugs.php.net/bug.php?id=44619 from years ago.

See also What does <! [CDATA []]> in XML means?

0


source







All Articles