Perl Parse XML file over HTTP with a few extra lines
I am trying to write a script that can collect information from an XML file from a remote server. The remote server requires authentication. I was able to authenticate as it uses basic authentication, but it seems that I cannot parse the data due to all the lines before the XML file. Is there a way to avoid getting all these lines and parsing the XML file correctly?
code
#! /usr/bin/perl
use LWP::UserAgent;
use HTTP::Request::Common;
use XML::Simple;
$ua = LWP::UserAgent->new;
$req = HTTP::Request->new(GET => 'https://192.168.1.10/getxml?/home/');
$ua->ssl_opts(SSL_verify_mode => SSL_VERIFY_NONE); #Used to ignore certificate
$req->authorization_basic('admin', 'test');
$test = $ua->request($req)->as_string;
print $test;
# create object
my $xml = new XML::Simple;
# read XML file
my $data = $xml->XMLin("$test");
# access XML data
print $data->{status}[0]{productID};
answer
HTTP/1.1 200 OK
Connection: close
Date: Wed, 24 Sep 2014 01:12:20 GMT
Server:
Content-Length: 252
Content-Type: text/xml; charset=UTF-8
Client-Date: Wed, 24 Sep 2014 01:11:59 GMT
Client-Peer: 192.168.1.10:443
Client-Response-Num: 1
Client-SSL-Cert-Issuer: XXXXXXXXXXXX
Client-SSL-Cert-Subject: XXXXXXXXXXXXX
Client-SSL-Cipher: XXXXXXXXXXXX
Client-SSL-Socket-Class: IO::Socket::SSL
<?xml version="1.0"?>
<Status>
<SystemUnit item="1">
<ProductId item="1">TEST SYSTEM</ProductId>
</SystemUnit>
</Status>
:1: parser error : Start tag expected, '<' not found
HTTP/1.1 200 OK
source to share
I would find a match for the first <and get the rest of the data from there. This will skip the first items that don't interest you. The code will look like this:
$test =~ m/(<.*)/s;
my $xmlData = $1;
my $data = $xml->XMLin("$xmlData");
# Fix the print to get the item for which I believe you are trying to obtain
print $data->{SystemUnit}{ProductId}{content}."\n";
where we are fixing <and whatever follows with the s-modifier to specify elements should be treated as one character line (mostly to ignore newlines). $ 1 is the captured data from the match statement that I assigned to a variable if you want to print it or view it in the debugger Also, I added the following to get "TEST SYSTEM" as the content of the ProductId tag.
source to share