How to get rid of unwanted nodes returned by findnodes from Perl XML :: LibXML module?

Question

How to get rid of unwanted nodes returned by findnodes from Perl XML :: LibXML module?

Below is just a small portion of the XML I am working on. I want to extract all the attributes, tag name and texts under the substring.

<?xml version='1.0' encoding='UTF-8'?>
<Warehouse>
<Equipment id="ABC001" model="TV" version="3_00">
<attributes>
<Location>Chicago</Location>
<Latitude>30.970</Latitude>
<Longitude>-90.723</Longitude>
</attributes>
</Equipment></Warehouse>

I have some code like this:

#!/usr/bin/perl
use XML::LibXML;
use Data::Dumper;

$parser = XML::LibXML->new();
$Chunk = $parser->parse_file("numone.xml");

@Equipment = $Chunk->findnodes('//Equipment');
foreach $at ($Equipment[0]->getAttributes()) {
    ($na,$nv) = ($at -> getName(),$at -> getValue());
    print "$na => $nv\n";
}

@Equipment = $Chunk->findnodes('//Equipment/attributes');
@Attr = $Equipment[0]->childNodes;
print Dumper(@Attr);

foreach $at (@Attr) {
    ($na,$nv) = ($at->nodeName, $at->textContent);
    print "$na => $nv\n";
}

I am getting the following results:

id => ABC001
model => TV
version => 3_00
$VAR1 = bless( do{\(my $o = 10579528)}, 'XML::LibXML::Text' );
$VAR2 = bless( do{\(my $o = 13643928)}, 'XML::LibXML::Element' );
$VAR3 = bless( do{\(my $o = 13657192)}, 'XML::LibXML::Text' );
$VAR4 = bless( do{\(my $o = 13011432)}, 'XML::LibXML::Element' );
$VAR5 = bless( do{\(my $o = 10579752)}, 'XML::LibXML::Text' );
$VAR6 = bless( do{\(my $o = 10565696)}, 'XML::LibXML::Element' );
$VAR7 = bless( do{\(my $o = 13046400)}, 'XML::LibXML::Text' );
#text =>

Location => Chicago
#text =>

Latitude => 30.970
#text =>

Longitude => -90.723
#text =>

Extracting the attributes looks fine, but extracting the tag name and text has gotten extra content. My questions:

Where did the element come from ::Text

?
How can I get rid of these additional elements and #text

things?

Thank,

+3

xml perl libxml2

mkt2012 07 Mar 12 at 16:52

source to share

2 answers

Greg Bacon · Answer 1 · 2012-03-07T17:09:15+0000

Additional nodes are text nodes that only contain spaces, such as newlines between elements. Skip them if you want:

@Equipment = $Chunk->findnodes('//Equipment/attributes');
@Attr = $Equipment[0]->childNodes;
foreach $at (@Attr) {
    ($na,$nv) = ($at->nodeName, $at->textContent);

    next if $na eq "#text";  # skip text nodes between elements

    print "$na => $nv\n";
}

Output:

id => ABC001
model => TV
version => 3_00
Location => Chicago
Latitude => 30.970
Longitude => -90.723

Borodin · Answer 2 · 2012-03-07T19:23:37+0000

First of all you need use strict

both use warnings

at the beginning of your program and declare all variables at the point of first use with my

. This will reveal many simple bugs and is especially important in the programs you ask for help.

As you were told, records XML::LibXML::Text

are empty text nodes. If you want the parser to be XML::LibXML

ignored then set the parameter no_blanks

to the parser object.

Also, you would be better off using a more modern method load_xml

instead of the deprecated parse_file

one as shown below

my $parser = XML::LibXML->new(no_blanks => 1);
my $Chunk = $parser->load_xml(location => "numone.xml");

The result of this modified version of the program looks like

id => ABC001
model => TV
version => 3_00
$VAR1 = bless( do{\(my $o = 7008120)}, 'XML::LibXML::Element' );
$VAR2 = bless( do{\(my $o = 7008504)}, 'XML::LibXML::Element' );
$VAR3 = bless( do{\(my $o = 7008144)}, 'XML::LibXML::Element' );
Location => Chicago
Latitude => 30.970
Longitude => -90.723

How to get rid of unwanted nodes returned by findnodes from Perl XML :: LibXML module?

More articles: