BeautifulSoup: RuntimeError: Maximum recursion depth exceeded

I cannot escape the maximum recursion depth of Python RuntimeError using BeautifulSoup.

I am trying to overwrite nested sections of code and pull the content. The preferred HTML looks like this (don't ask why it looks like this :)):

<div><code><code><code><code>Code in here</code></code></code></code></div>

      

The function I am passing my soup object with is:

def _strip_descendent_code(self, soup):
    sys.setrecursionlimit(2000)
    # soup = BeautifulSoup(html, 'lxml')
    for code in soup.findAll('code'):
        s = ""
        for c in code.descendents:
            if not isinstance(c, NavigableString):
                if c.name != code.name:
                    continue
                elif c.name == code.name:
                    if isinstance(c, NavigableString):
                        s += str(c)
                    else:
                        continue
        code.append(s)
    return str(soup)

      

You can see that I am trying to increase the default recursion limit, but this is not a solution. I've grown to the point where C reaches the memory limit on the computer and the function above never works.

Any help to get this to work and to point out the error / s would be greatly appreciated.

The stack trace echoes this:

  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1234, in find
    l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1255, in find_all
    return self._find_all(name, attrs, text, limit, generator, **kwargs)
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 529, in _find_all
    i = next(generator)
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1269, in descendants
    stopNode = self._last_descendant().next_element
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 284, in _last_descendant
    if is_initialized and self.next_sibling:
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 997, in __getattr__
    return self.find(tag)
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1234, in find
    l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1255, in find_all
    return self._find_all(name, attrs, text, limit, generator, **kwargs)
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 529, in _find_all
    i = next(generator)
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1269, in descendants
    stopNode = self._last_descendant().next_element
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 284, in _last_descendant
    if is_initialized and self.next_sibling:
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 997, in __getattr__
    return self.find(tag)
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1234, in find
    l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1255, in find_all
    return self._find_all(name, attrs, text, limit, generator, **kwargs)
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 512, in _find_all
    strainer = SoupStrainer(name, attrs, text, **kwargs)
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1548, in __init__
    self.text = self._normalize_search_value(text)
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1553, in _normalize_search_value
    if (isinstance(value, str) or isinstance(value, collections.Callable) or hasattr(value, 'match')
RuntimeError: maximum recursion depth exceeded while calling a Python object

      

+5


source to share


2 answers


I ran into this problem and have browsed many web pages. I will provide two methods to solve this problem.

However, I think we need to know why this happened. Python limits the number of recursives (the default is 1000). We can see this number with print sys.getrecursionlimit()

. I believe BeautifulSoup is using recursion to find children . When the recursion is over 1000 times, will appear RuntimeError: maximum recursion depth exceeded

.

First method: use sys.setrecursionlimit()

set a limited number of recursive. Obviously, you can set 1,000,000, but perhaps call segmentation fault

.

Second method: use try-except

. If it does maximum recursion depth exceeded

, our algorithm may have problems. Generally speaking, we can use loops instead of recursion. In your question, we might be dealing with HTML replace()

or regex in advance.



Finally, I'll give you an example.

from bs4 import BeautifulSoup
import sys   
#sys.setrecursionlimit(10000)

try:
    doc = ''.join(['<br>' for x in range(1000)])
    soup = BeautifulSoup(doc, 'html.parser')
    a = soup.find('br')
    for i in a:
        print i
except:
    print 'failed'

      

If removed #

, it can print doc

.

Hoping to help you.

+9


source


I'm not sure why this works (I haven't researched the source), but adding .text

or .get_text()

seems to work around the error for me.

For example changing

lambda x: BeautifulSoup(x, 'html.parser')



on the

lambda x: BeautifulSoup(x, 'html.parser').get_text()

works without recursion depth error.

+1


source







All Articles