Trying to use bs4 to skip an attribute if no data is available

I am trying to write a program that will extract data from a URL and format it so that I can copy to another program. Everything works for me, but I can't get it to skip the element if there is no img src in the imagelink tag.

import requests, sys, webbrowser, bs4
res = requests.get('http://hzws.selco.info/prototype.php?type=new-arrivals&lib=nor&collect=Bnewnf,Bnewmys,Bnewf,Bnewsf&days=14&key=7a8adfa9aydfa999997af')
res.raise_for_status()

soup = bs4.BeautifulSoup(res.text, "lxml")

img = soup.select('imagelink') #why won't this pull anything?!?!?!?!
link = soup.select('cataloglink')

length = min([14, len(img)])
for i in range(length):
  img1 = img[i].getText()
  link1 = link[i].getText()
  print('<div>' + link1 + img1 + '</a></div>')

      

At the moment this prints all urls regardless of whether or not a phrase is attached to it. I've tried many different things to make it skip the element if there is no img src. Any ideas?

+3


source to share


1 answer


Looking at the BS4 docs it looks like "lxml" is actually an HTML parser. You must replace it with "lxml-xml" since you are trying to clear the XML page. This should work.



0


source







All Articles