Web Crawler - TooManyRedirects: 30 redirects exceeded. (Python)

Question

Web Crawler - TooManyRedirects: 30 redirects exceeded. (Python)

I tried to follow one of the YouTube tutorials however I came across some problem. Anyone can help? I am new to python, I understand there are one or two similar questions, however I read and don't understand. Can anyone help me? Thanks to

import requests
from bs4 import BeautifulSoup
def trade_spider(max_pages):
    page = 1
    while page <= max_pages:
        url = "https://www.thenewboston.com/forum/home.php?page=" + str(page)
       source_code = requests.get(url)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text)
        for link in soup.findAll('a', {'class': 'post-title'}):
            href = link.get('href')
            print(href)
        page += 1
trade_spider(2)

After running the program, I got the error below.

Traceback (most recent call last):
  File "C:/Users/User/PycharmProjects/Basic/WebCrawlerTest.py", line 19, in <module>
    trade_spider(2)
  File "C:/Users/User/PycharmProjects/Basic/WebCrawlerTest.py", line 9, in trade_spider
    source_code = requests.get(url)
  File "C:\Users\User\AppData\Roaming\Python\Python34\site-packages\requests\api.py", line 69, in get
    return request('get', url, params=params, **kwargs)
  File "C:\Users\User\AppData\Roaming\Python\Python34\site-packages\requests\api.py", line 50, in request
    response = session.request(method=method, url=url, **kwargs)
  File "C:\Users\User\AppData\Roaming\Python\Python34\site-packages\requests\sessions.py", line 465, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\User\AppData\Roaming\Python\Python34\site-packages\requests\sessions.py", line 594, in send
    history = [resp for resp in gen] if allow_redirects else []
  File "C:\Users\User\AppData\Roaming\Python\Python34\site-packages\requests\sessions.py", line 594, in <listcomp>
    history = [resp for resp in gen] if allow_redirects else []
  File "C:\Users\User\AppData\Roaming\Python\Python34\site-packages\requests\sessions.py", line 114, in resolve_redirects
    raise TooManyRedirects('Exceeded %s redirects.' % self.max_redirects)
requests.exceptions.TooManyRedirects: Exceeded 30 redirects.

+3

python web-crawler

hamster 01 june 15 at 13:05

source to share

2 answers

Well, it looks like the page you are trying to crawl is just broken: try putting https://www.thenewboston.com/forum/home.php?page=1 in your web browser: when I try with Chrome I get error message:

This web page has a redirect loop

ERR_TOO_MANY_REDIRECTS

You will need to figure out for yourself how you want to deal with such broken pages in your finder.

+1

Josh kupershmidt 01 june 15 at 13:12

source to share

Ajay · Accepted Answer · 2015-06-01T13:23:57+0000

The url of this forum has changed

Two modifications for your code

Changed forum 1.url ( https://www.thenewboston.com/forum/recent_activity.php?page= "+ st (page))
allow_redirects = False (to disable redirects, if any).

import requests
from bs4 import BeautifulSoup
def trade_spider(max_pages):
    page = 1
    while page <= max_pages:
        url = "https://www.thenewboston.com/forum/recent_activity.php?page=" + str(page)
        print url
        source_code = requests.get(url, allow_redirects=False)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text)
        for link in soup.findAll('a', {'class': 'post-title'}):

            href = link.get('href')
            print(href)
        page += 1
print trade_spider(2)

Web Crawler - TooManyRedirects: 30 redirects exceeded. (Python)

More articles: