Parsing HTML using LXML in Python
I am trying to parse a website for
blahblahblah
<a href="THIS IS WHAT I WANT" title="NOT THIS">I DONT CARE ABOUT THIS EITHER</a>
blahblahblah
(there are a lot of them, and I want them all to be in some symbolic form). Unfortunately the HTML is very large and a bit complex, so trying to crawl through the tree can take a while to just sort the nested elements. Is there an easy way to just get this?
Thank!
+3
source to share
1 answer