Trying to parse XML from string in Python

So first the line

'<?xml version="1.0" encoding="UTF-8"?><metalink version="3.0" xmlns="http://www.metalinker.org/" xmlns:lcgdm="LCGDM:" generator="lcgdm-dav" pubdate="Fri, 11 Oct 2013 12:46:10 GMT"><files><file name="/lhcb/L"><size>173272912</size><resources><url type="https">https://test-kit.test.de:2880/pnfs/test.file</url><url type="https">https://test.grid.sara.nl:2882/pnfs/test.file</url></resources></file></files></metalink>'

      

I want to extract text url

. The following code works, but has drawbacks because it is hardcoded:

root = ET.fromstring( xml_string )
for entry in root[0][0][1].iter():
  print entry.text

      

So this only works if the xml structure is the same. I tried using xpath, but I never worked with or with tags. I never got any results.

Is this a problem with the xml string format or am I doing something wrong?

+3


source to share


2 answers


You can use xpath (and findall

function Node

) to get urls, but since you used xmlns="http://www.metalinker.org/"

for the root element, you will need to use this one xmlns

in xpath

as well.

Example -



>>> root = fromstring(xml_string)
>>> urls = root.findall('.//{http://www.metalinker.org/}url')
>>> for url in urls:
...     print(url.text)
...
https://test-kit.test.de:2880/pnfs/test.file
https://test.grid.sara.nl:2882/pnfs/test.file

      

The above xpath will find all urls in the xml.

+3


source


You used namespaces, so you need to use them in XPath:



for entry in root.findall('.//{http://www.metalinker.org/}url'):
    print entry.text

      

+3


source







All Articles