How to get rid of unwanted nodes returned by findnodes from Perl XML :: LibXML module?
Below is just a small portion of the XML I am working on. I want to extract all the attributes, tag name and texts under the substring.
<?xml version='1.0' encoding='UTF-8'?>
<Warehouse>
<Equipment id="ABC001" model="TV" version="3_00">
<attributes>
<Location>Chicago</Location>
<Latitude>30.970</Latitude>
<Longitude>-90.723</Longitude>
</attributes>
</Equipment></Warehouse>
I have some code like this:
#!/usr/bin/perl
use XML::LibXML;
use Data::Dumper;
$parser = XML::LibXML->new();
$Chunk = $parser->parse_file("numone.xml");
@Equipment = $Chunk->findnodes('//Equipment');
foreach $at ($Equipment[0]->getAttributes()) {
($na,$nv) = ($at -> getName(),$at -> getValue());
print "$na => $nv\n";
}
@Equipment = $Chunk->findnodes('//Equipment/attributes');
@Attr = $Equipment[0]->childNodes;
print Dumper(@Attr);
foreach $at (@Attr) {
($na,$nv) = ($at->nodeName, $at->textContent);
print "$na => $nv\n";
}
I am getting the following results:
id => ABC001
model => TV
version => 3_00
$VAR1 = bless( do{\(my $o = 10579528)}, 'XML::LibXML::Text' );
$VAR2 = bless( do{\(my $o = 13643928)}, 'XML::LibXML::Element' );
$VAR3 = bless( do{\(my $o = 13657192)}, 'XML::LibXML::Text' );
$VAR4 = bless( do{\(my $o = 13011432)}, 'XML::LibXML::Element' );
$VAR5 = bless( do{\(my $o = 10579752)}, 'XML::LibXML::Text' );
$VAR6 = bless( do{\(my $o = 10565696)}, 'XML::LibXML::Element' );
$VAR7 = bless( do{\(my $o = 13046400)}, 'XML::LibXML::Text' );
#text =>
Location => Chicago
#text =>
Latitude => 30.970
#text =>
Longitude => -90.723
#text =>
Extracting the attributes looks fine, but extracting the tag name and text has gotten extra content. My questions:
- Where did the element come from
::Text
? - How can I get rid of these additional elements and
#text
things?
Thank,
source to share
Additional nodes are text nodes that only contain spaces, such as newlines between elements. Skip them if you want:
@Equipment = $Chunk->findnodes('//Equipment/attributes');
@Attr = $Equipment[0]->childNodes;
foreach $at (@Attr) {
($na,$nv) = ($at->nodeName, $at->textContent);
next if $na eq "#text"; # skip text nodes between elements
print "$na => $nv\n";
}
Output:
id => ABC001 model => TV version => 3_00 Location => Chicago Latitude => 30.970 Longitude => -90.723
source to share
First of all you need use strict
both use warnings
at the beginning of your program and declare all variables at the point of first use with my
. This will reveal many simple bugs and is especially important in the programs you ask for help.
As you were told, records XML::LibXML::Text
are empty text nodes. If you want the parser to be XML::LibXML
ignored then set the parameter no_blanks
to the parser object.
Also, you would be better off using a more modern method load_xml
instead of the deprecated parse_file
one as shown below
my $parser = XML::LibXML->new(no_blanks => 1);
my $Chunk = $parser->load_xml(location => "numone.xml");
The result of this modified version of the program looks like
id => ABC001
model => TV
version => 3_00
$VAR1 = bless( do{\(my $o = 7008120)}, 'XML::LibXML::Element' );
$VAR2 = bless( do{\(my $o = 7008504)}, 'XML::LibXML::Element' );
$VAR3 = bless( do{\(my $o = 7008144)}, 'XML::LibXML::Element' );
Location => Chicago
Latitude => 30.970
Longitude => -90.723
source to share