Accessing only some elements of BeautifulSoup result with negative indices

I am using BeautifulSoup4 to parse a document and I am getting some strange behavior, the corresponding code snippet looks like this:

for sale_table in sales_soup.find_all('table'):
    rows = sale_table.find_all('tr')
    grantor = rows[3]

      

However, this gives me an out of range exception. So I did some basic checks and len (rows) == 4 just before and after the destination is assigned (using an index that doesn't throw an exception). Also I can access the first and second elements of lines with lines [0] and lines [1]. However, I can only access items 3 and 4 with lines [-1] and lines [-2] by trying to use indices, 2 or 3, or -3 or -4, throwing an out of range exception. Also when I file.write (str (rows)) and html I get matches exactly to the html of the test document.

In summation, I can access the entire list, but I would like to understand why I am getting this strange exception.

Sorry guys, the answer is I'm an idiot. There is an inconsistent table in the markup which is shorter and throws an exception. Running the loop one at a time shows len! = 4 on each iteration, sorry for the misinformation. Is this the wrong form to edit this question as it is incorrect?

+3


source to share


1 answer


You should never index a list of unknown size. Never trust the markup to be right all the time.

In my experience with BeautifulSoup, you have to write a lot of if statements to cover for yourself. Change the above code to something like this:



for sale_table in sales_soup.find_all('table'):
  rows = sale_table.find_all('tr')
  if len(rows) > 3:
    grantor = rows[3]
  else:
    grantor = None

      

Also, have a look at the BS4 documentation for more options .find_all()

that might be helpful for your use case. If you are only using 4th element, use limit=4

as keyword argument.

0


source







All Articles