Parsing XML using PHP - including ampersands and other characters

I am trying to parse an XML file and one of the fields looks like this:

<link>http://foo.com/this-platform/scripts/click.php?var_a=a&var_b=b&varc=http%3A%2F%2Fwww.foo.com%2Fthis-section-here%2Fperf%2F229408%3Fvalue%3D0222%26some_variable%3Dmeee</link>

      

This seems to break the parser. I think it might be related to and in the link?

My code is pretty simple:

<?

$xml = simplexml_load_file("files/this.xml");

echo $xml->getName() . "<br />";

foreach($xml->children() as $child) {
  echo $child->getName() . ": " . $child . "<br />";
}
?>

      

any ideas how i can solve this?

+2


source to share


5 answers


Comment mjv solved it:



Alternatively, using &, you might consider placing URLs and other XML-unfriendly content in, i.e. Character data block

0


source


The tagged XML fragment is not valid. The ampersands must be escaped, hence the complaints of the parser.



+3


source


Your XML feed is not valid XML: &

must be escaped as&amp;

This means you cannot use an XML parser: - (

A possible "solution" (seems to be wrong, but should work) would be to replace " &

" that are not part of the object " &amp;

" to get the correct XML string before loading the XML parser.


In your case, given this:

$str = <<<STR
<xml>
  <link>http://foo.com/this-platform/scripts/click.php?var_a=a&var_b=b&varc=http%3A%2F%2Fwww.foo.com%2Fthis-section-here%2Fperf%2F229408%3Fvalue%3D0222%26some_variable%3Dmeee</link>
</xml>
STR;

      

You can use a simple call str_replace

like:

$str = str_replace('&', '&amp;', $str);

      

And then parse the string (now XML-valid) that is in $str

:

$xml = simplexml_load_string($str);
var_dump($xml);

      

In this case, it should work ...


But keep in mind that you must take care of the entities: if you already have an object of type < &gt;

', you should not replace it with' &amp;gt;

'!

This means that such a simple call is str_replace

not the right solution: it will probably break stuff in many XML feeds!

It's up to you to figure out the correct way to do this replacement - perhaps with some kind of regex ...

+3


source


It breaks the parser because your XML is invalid - &

must be encoded as &amp;

.

+2


source


0


source







All Articles