Complex xPath query with getNodeSet in R
I have the following xml file loaded from Uniprot squirrel database.
protein <- xmlRoot(xmlTreeParse("http://www.uniprot.org/uniprot/Q01974.xml"))
Of the many annotated functions, I'm interested in the start and end position of the kinase domain stored in the following xml node:
<feature type="domain" description="Protein kinase">
<location>
<begin position="288"/>
<end position="539"/>
</location>
</feature>
With getNodeSet, I could find this tag well:
getNodeSet(protein, "//uniprot:feature[@type=\"domain\" and @description=\"Protein kinase\"]", c(uniprot="http://uniprot.org/uniprot"))
Unfortunately I couldn't narrow down the query, adding any other criteria returns an empty list. Example:
getNodeSet(protein, "//uniprot:feature[@type=\"domain\" and @description=\"Protein kinase\"]/location", c(uniprot="http://uniprot.org/uniprot"))
Based on an online xpath test, this should be a valid xpath query, but returns empty:
list()
attr(,"class")
[1] "XMLNodeSet"
Can anyone help me with this query? I'm sure this is the normal behavior of getNodeSet, but I don't know what is rational behind it. In general, what is the most appropriate way to express such relatively complex queries in R? Should I store the result and then narrow it down?
Many thanks!
source to share
use the same prefix for the subsequent element:
//uniprot:feature[...]/uniprot:location
prefix + local-name
identify each item. If you have XML with a default namespace (which seems to be what you have), the entire unprefixed element is considered in the default namespace. This is the reason why you need to use the * prefix for every element in XPath (not just the first element).
*) a prefix that points to the default namespace URI
source to share