Using lxml to parse xml with multiple namespaces

I am pulling xml from SOAP api which looks like this:

<SOAP-ENV:Envelope xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ae="urn:sbmappservices72" xmlns:c14n="http://www.w3.org/2001/10/xml-exc-c14n#" xmlns:diag="urn:SerenaDiagnostics" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd" xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd" xmlns:xenc="http://www.w3.org/2001/04/xmlenc#" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<SOAP-ENV:Header/>
<SOAP-ENV:Body>
    <ae:GetItemsByQueryResponse>
      <ae:return>
        <ae:item>
          <ae:id xsi:type="ae:ItemIdentifier">
            <ae:displayName/>
            <ae:id>10</ae:id>
            <ae:uuid>a9b91034-8f4d-4043-b9b6-517ba4ed3a33</ae:uuid>
            <ae:tableId>1541</ae:tableId>
            <ae:tableIdItemId>1541:10</ae:tableIdItemId>
            <ae:issueId/>
          </ae:id>

      

I can't for the rest of my life use findall to pull out something like tableId. Most of the parsing tutorials using lxml do not include namespaces, but the one with lxml.de and I am trying to follow it.

According to their tutorial, you should create a namespace dictionary which I did like this:

r = tree.xpath('/e:SOAP-ENV/s:ae', 
        namespaces={'e': 'http://schemas.xmlsoap.org/soap/envelope/',
                    's': 'urn:sbmappservices72'})

      

But that doesn't seem to work, since when I try to get len ​​of r it returns as 0:

print 'length: ' + str(len(r)) #<---- always equals 0

      

Since the URI for the second namespace is "urn:" I tried to use the real url for the wsdl, but it gives me the same result.

Is there something obvious that I am missing? I just need to be able to pull values ​​like those for tableIdItemId.

Any help would be greatly appreciated.

+3


source to share


1 answer


Your XPath doesn't match the XML structure. Try this instead:

r = tree.xpath('/e:Envelope/e:Body/s:GetItemsByQueryResponse/s:return/s:item/s:id/s:tableId', 
        namespaces={'e': 'http://schemas.xmlsoap.org/soap/envelope/',
                    's': 'urn:sbmappservices72'})

      

For small XML, you can use //

instead /

to simplify the expression, for example:



r = tree.xpath('/e:Envelope/e:Body//s:tableId', 
        namespaces={'e': 'http://schemas.xmlsoap.org/soap/envelope/',
                    's': 'urn:sbmappservices72'})

      

/e:Body//s:tableId

will find tableId

no matter how deeply it is nested in Body

. Note, however, that it //

is certainly slower than /

that, especially when applied to huge XML.

+2


source







All Articles