Complex xPath query with getNodeSet in R

I have the following xml file loaded from Uniprot squirrel database.

protein <- xmlRoot(xmlTreeParse("http://www.uniprot.org/uniprot/Q01974.xml"))

      

Of the many annotated functions, I'm interested in the start and end position of the kinase domain stored in the following xml node:

<feature type="domain" description="Protein kinase">
<location>
<begin position="288"/>
<end position="539"/>
</location>
</feature>

      

With getNodeSet, I could find this tag well:

getNodeSet(protein, "//uniprot:feature[@type=\"domain\" and @description=\"Protein kinase\"]", c(uniprot="http://uniprot.org/uniprot"))

      

Unfortunately I couldn't narrow down the query, adding any other criteria returns an empty list. Example:

getNodeSet(protein, "//uniprot:feature[@type=\"domain\" and @description=\"Protein kinase\"]/location", c(uniprot="http://uniprot.org/uniprot"))

      

Based on an online xpath test, this should be a valid xpath query, but returns empty:

list()
attr(,"class")
[1] "XMLNodeSet"

      

Can anyone help me with this query? I'm sure this is the normal behavior of getNodeSet, but I don't know what is rational behind it. In general, what is the most appropriate way to express such relatively complex queries in R? Should I store the result and then narrow it down?

Many thanks!

+1


source to share


1 answer


use the same prefix for the subsequent element:

//uniprot:feature[...]/uniprot:location

      



prefix + local-name

identify each item. If you have XML with a default namespace (which seems to be what you have), the entire unprefixed element is considered in the default namespace. This is the reason why you need to use the * prefix for every element in XPath (not just the first element).

*) a prefix that points to the default namespace URI

+2


source







All Articles