Accessing href value with HTML :: TreeBuilder :: XPath

I am using LWP::UserAgent

, HTML::Selector::XPath

and HTML::TreeBuilder::XPath

to get the attribute value of the href

first YouTube video in a search result set.

My code so far:

use LWP::UserAgent;
use HTML::TreeBuilder::XPath;
use HTML::Selector::XPath;

my $ua = LWP::UserAgent->new;

#my $response =..
my $html = "http://www.youtube.com/results?search_query=run+flo+rida";

my $tree = HTML::TreeBuilder::XPath->new;

my $xpath = HTML::Selector::XPath::selector_to_xpath("(//*[@id = 'search-results']/li)[1]/div[2]/h3/a/@href/");
my @nodes = $tree->findnodes($xpath);
print" $nodes[0]";

      

I'm not sure if my print is wrong if some other syntax is wrong. At the moment it is printing

HTML::TreeBuilder::XPath=HASH(0x1a78250)

      

when i search for it to print

/watch?v=JP68g3SYObU

      

Thanks for any help!

+3


source to share


1 answer


There are several problems here.

  • You should always use strict

    and use warnings

    at the top of every Perl program. It will catch a lot of errors that you can easily miss and is polite when you ask for help with your code. In this case, you should have warned you that your XPath string contains array variable names @id

    and @href

    that you might not need to interpolate into a string.

  • You are using HTML::Selector::XPath

    that translates a CSS selector into an XPath expression. But you supply this XPath expression, so it won't work and no module is needed.

  • No need to use LWP

    at all as it HTML::TreeBuilder

    has a constructor new_from_url

    that will fetch the HTML page for you.

This program seems to do what you want it to. I also added a module URI

to get the absolute url from the relative value of the attribute href

.



use strict;
use warnings;

use HTML::TreeBuilder::XPath;
use URI;

my $url = "http://www.youtube.com/results?search_query=run+flo+rida";

my $tree = HTML::TreeBuilder::XPath->new_from_url($url);

my $anchor = $tree->findnodes('//ol[@id="search-results"]//h3[@class="yt-lockup2-title"]/a/@href');
my $href = URI->new_abs($anchor->[0]->getValue, $url);
print $href;

      

Output

http://www.youtube.com/watch?v=JP68g3SYObU

      

+7


source







All Articles