I get a recursive error [RuntimeError: Maximum recursion depth exceeded when calling a Python object] - but my code is iterative - or is it?

I am getting recursive error:

RuntimeError: maximum recursion depth exceeded when calling Python object

But my code is iterative ... or isn't it? I thought it was based on the documentation (here for example: http://www.pythonlearn.com/html-008/cfbook006.html ). I read about how to change the algorithm / code from recursive to iterative (for example http://blog.moertel.com/posts/2013-05-11-recursive-to-iterative.html ) but I just can't see how it's recursive in the first place.

This code is sent to the site, searches, and returns about 122 pages of results. It then scans each result page and collects links. Then he has to click on each link and clear the text / html from each one.

The code works nicely until it gets to the final: loop for url in article_urls:

. It will capture and store (on Dropbox) just over 200 rows per columns before it returns an error.

I am trying to solve a puzzle: how can I avoid this error?

Here is the code:



from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

def isReady(browser):
    return browser.execute_script("return document.readyState") == "complete"

def waitUntilReady(browser):
    if not isReady(browser):
        waitUntilReady(browser)

browser = webdriver.Firefox()
browser.get('http://www.usprwire.com/cgi-bin/news/search.cgi')

# make a search
query = WebDriverWait(browser, 60).until(EC.presence_of_element_located((By.NAME, "query")))
query.send_keys('"test"')
submit = browser.find_element_by_xpath("//input[@value='Search']")
submit.click()
numarticles = 0

# grab article urls
npages = 1
article_urls = []
for page in range(1, npages + 1):
    article_urls += [elm.get_attribute("href") for elm in browser.find_elements_by_class_name('category_links')]
    if page <= 121: #click to the next page
        browser.find_element_by_link_text('[>>]').click()
    if page == 122: #last page in search results, so no '[>>]'' to click on. Move on to next steps.
        continue



# iterate over urls and save the HTML source
for url in article_urls:
    browser.get(url)
    waitUntilReady(browser)
    numarticles = numarticles+1
    title = browser.current_url.split("/")[-1]
    with open('/Users/My/Dropbox/File/Place/'+str(numarticles)+str(title), 'w') as fw:
        fw.write(browser.page_source.encode('utf-8'))

      

Thanks a lot in advance for any input.

+3


source to share


2 answers


Obviously yours waitUntilReady

goes into infinite recursion by calling itself.

You should change it to something like this:



while not isReady(browser):
    time.sleep(1)

      

Waiting for a page to load in Selenium is not as obvious as it sounds, you can read more in Harry JW Percival's article

+3


source


waitUntilReady is a recursive function! It can be called many times, especially if you have a slow connection.

Here's a possible workaround:



def waitUntilReady():
    while not isReady():
        time.sleep(10)

      

+2


source







All Articles