Parsing XML and Converting to PHP?
I have a custom XML schema defined to render a page that puts elements on the page by evaluating the XML elements on the page. This is currently implemented using preg regex functions, most notably the excellent preg_replace_callback function, for example:
...
$s = preg_replace_callback("!<field>(.*?)</field>!", replace_field, $s);
...
function replace_field($groups) {
return isset($fields[$group[1]) ? $fields[$groups[1]] : "";
}
As an example.
Now this works pretty well ... as long as the XML elements are not nested. It gets a lot more complicated at this point, for example if you have:
<field name="outer">
<field name="inner">
...
</field>
</field>
First, you want you to replace the innermost field first. Reasonable use of greedy / unwanted regex patterns might go somehow to handle these more complex scenarios, but a clear message that I am reaching the limits of what a regex can reasonably do and really need to parse XML ...
What I need is an XML transform package that:
allows me to conditionally evaluate / include the contained document tree or not based on a callback function ideally (similar to preg_replace_callback); can handle nested elements of one or more types; and also handles attributes in a nice way (like an associative array, for example).
What can help me along the way?
The PHP XSLTProcessor
class ( ext / xsl - PHP 5 includes the default XSL extension and can be enabled by adding an argument --with-xsl[=DIR]
to your config string) is quite complex and allows, among other things, PHP functions in your XSL document using XSLTProcessor::registerPHPFunctions()
.
The following example is shamelessly squeezed into the PHP manual page :
$xml = '<allusers>
<user>
<uid>bob</uid>
</user>
<user>
<uid>joe</uid>
</user>
</allusers>';
$xsl = '<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:php="http://php.net/xsl">
<xsl:output method="html" encoding="utf-8" indent="yes"/>
<xsl:template match="allusers">
<html><body>
<h2>Users</h2>
<table>
<xsl:for-each select="user">
<tr><td>
<xsl:value-of
select="php:function(\'ucfirst\',string(uid))"/>
</td></tr>
</xsl:for-each>
</table>
</body></html>
</xsl:template>
</xsl:stylesheet>';
$xmldoc = DOMDocument::loadXML($xml);
$xsldoc = DOMDocument::loadXML($xsl);
$proc = new XSLTProcessor();
$proc->registerPHPFunctions();
$proc->importStyleSheet($xsldoc);
echo $proc->transformToXML($xmldoc);
You can use XSL for this - match the internal templates first.
Here's a good starting point for learning what you can do with XSL:
http://www.w3schools.com/xsl/
You can do xsl transform server or on client (using js, activex or others).
If you still hate this xsl idea, you can take a look at the xml parsing built into PHP - google for the PHP SAX parser, which is a callback implementation to create your custom parser currently using libxml2.
Definitely not regular expressions. XML formats can be modified in ways that do not affect their content (in other words: invisible to XML processing libraries), but are important for regular expressions. This kind of code quickly becomes a maintenance nightmare.
As for using the parser (SAX, StAX, DOM, JDOM, dom4j, XOM, etc.)