Complex xPath query with getNodeSet in R

Question

Complex xPath query with getNodeSet in R

I have the following xml file loaded from Uniprot squirrel database.

protein <- xmlRoot(xmlTreeParse("http://www.uniprot.org/uniprot/Q01974.xml"))

Of the many annotated functions, I'm interested in the start and end position of the kinase domain stored in the following xml node:

<feature type="domain" description="Protein kinase">
<location>
<begin position="288"/>
<end position="539"/>
</location>
</feature>

With getNodeSet, I could find this tag well:

getNodeSet(protein, "//uniprot:feature[@type=\"domain\" and @description=\"Protein kinase\"]", c(uniprot="http://uniprot.org/uniprot"))

Unfortunately I couldn't narrow down the query, adding any other criteria returns an empty list. Example:

getNodeSet(protein, "//uniprot:feature[@type=\"domain\" and @description=\"Protein kinase\"]/location", c(uniprot="http://uniprot.org/uniprot"))

Based on an online xpath test, this should be a valid xpath query, but returns empty:

list()
attr(,"class")
[1] "XMLNodeSet"

Can anyone help me with this query? I'm sure this is the normal behavior of getNodeSet, but I don't know what is rational behind it. In general, what is the most appropriate way to express such relatively complex queries in R? Should I store the result and then narrow it down?

Many thanks!

+1

xml r xpath

SDani Apr 22 14 at 5:23 am

source to share

1 answer

har07 · Accepted Answer · 2014-04-22T05:45:47+0000

use the same prefix for the subsequent element:

//uniprot:feature[...]/uniprot:location

prefix + local-name

identify each item. If you have XML with a default namespace (which seems to be what you have), the entire unprefixed element is considered in the default namespace. This is the reason why you need to use the * prefix for every element in XPath (not just the first element).

*) a prefix that points to the default namespace URI

Complex xPath query with getNodeSet in R

More articles: