Perl libXML find node by attribute value

I have a very large XML document that I am iterating over. XML uses mostly attributes, not node values. I may need to find many files in a file to collect one group of information. They are related to each other using different ref tag values. Currently, every time I need to find one of the nodes to retrieve data, I iterate over the entire XML and match the attribute to find the correct node. Is there a more efficient way to just select the node of a given attribute value instead of a canned loop and comparison? My current code is so slow that it is almost useless.

I am currently doing something like this many times in the same file for many different nodes and attribute combinations.

my $searchID = "1234";
foreach my $nodes ($xc->findnodes('/plm:PLMXML/plm:ExternalFile')) {
    my $ID      = $nodes->findvalue('@id');
    my $File    = $nodes->findvalue('@locationRef');
    if ( $searchID eq $ID ) {
        print "The File Name = $File\n";
    }
}

      

In the above example, I am looping and using "if" comparison to match ids. I was hoping I could do something like this below to just map a node attribute by attribute ... and would it be more efficient than a loop?

my $searchID = "1234";
$nodes = ($xc->findnodes('/plm:PLMXML/plm:ExternalFile[@id=$searchID]'));
my $File    = $nodes->findvalue('@locationRef');
print "The File Name = $File\n";

      

+3


source to share


4 answers


Go through one pass to extract the information you want into a more convenient format or create an index.

my %nodes_by_id;
for my $node ($xc->findnodes('//*[@id]')) {
    $nodes_by_id{ $node->getAttribute('id') } = $node;
}

      

Then your loops will become



my $node = $nodes_by_id{'1234'};

      

(And stop using findvalue

instead getAttribute

.)

+2


source


If you will be doing this for a lot of identifiers, then ikegami's answer is worth reading.

I was hoping I could do something like this below to just match node by attribute

...

$nodes = ($xc->findnodes('/plm:PLMXML/plm:ExternalFile[@id=$searchID]'));

      

Sorting.

For a given id, yes, you can do

$nodes = $xc->findnodes("/plm:PLMXML/plm:ExternalFile[\@id=$searchID]");

      

... provided that $searchID

is known to be numeric. Note that double quotes in perl means the variables are interpolated, so you should avoid @id

because it is part of a literal string, not a perl array, whereas you want the value to $searchID

become part of the xpath string, so it won't escape.

Note that in this case you are requesting it in a scalar context, there will be an XML :: LibXML :: Nodelist object, not an actual node, not an arrayref; for the latter, you will need to use square brackets instead of parentheses, as I did in the following example.



Alternatively, if your search ID cannot be numeric, but you know for sure that it is safe to fit into an XPath string (for example, it has no quotes), you can do the following:

$nodes = [ $xc->findnodes('/plm:PLMXML/plm:ExternalFile[@id="' . $searchID . '"]') ];
print $nodes->[0]->getAttribute('locationRef'); # if you're 100% sure it exists

      

Note that the resulting string will enclose the value in quotes.

Finally, you can skip straight ahead:

print $xc->findvalue('/plm:PLMXML/plm:ExternalFile[@id="' . $searchID . '"]/@locationRef');

      

... if you know there is only one node with this id.

+2


source


If you have a DTD for your document that declares an attribute id

as a DTD id

, and you make sure the DTD is read when parsing the document, you can efficiently refer to elements with a specific ID via $doc->getElementById($id)

.

+1


source


I think you just need to learn a little bit about XPath expressions. For example, you can do something like this:

my $search_id = "1234";
my $query = "/plm:PLMXML/plm:ExternalFile/[\@id = '$search_id']";
foreach my $node ($xc->findnodes($query)) {
    # ...
}

      

In an XPath expression, you can also combine multiple attribute checks, for example:

[@id = '$search_id' and contains(@pathname, '.pdf')]

      

One XPath tutorial a lot

+1


source







All Articles