How can I read these cells from html code using python web clips?

Question

How can I read these cells from html code using python web clips?

I want to scrape exchange price information from this site and after picking it up in the database: https://www.mnb.hu/arfolyamok

I need this piece of html:

<tbody>
    <tr>
        <td class="valute"><b>CHF</b></td>
        <td class="valutename">svájci frank</td>
        <td class="unit">1</td>
        <td class="value">284,38</td>
    </tr>
    <tr>
        <td class="valute"><b>EUR</b></td>
        <td class="valutename">euro</td>
        <td class="unit">1</td>
        <td class="value">308,54</td>
    </tr>
    <tr>
        <td class="valute"><b>USD</b></td>
        <td class="valutename">USA dollár</td>
        <td class="unit">1</td>
        <td class="value">273,94</td>
    </tr>
</tbody>

This is why I wrote the code, but something is wrong with it. How can I fix this, where should I change it? I only want the data "valute", "valutename", "unit" and "value". I am working with Python 2.7.13 on Windows 7.

The error message is: "There is an error in your program: unindent does not match the outer indentation level."

The code is here:

import csv
import requests
from BeautifulSoup import BeautifulSoup

url = 'https://www.mnb.hu/arfolyamok'
response = requests.get(url)
html = response.content

soup = BeautifulSoup(html)
table = soup.find('tbody', attrs={'class': 'stripe'})

table = str(soup)
table = table.split("<tbody>")

list_of_rows = []
for row in table[1].findAll('tr')[1:]:
    list_of_cells = []
   for cell in row.findAll('td'):
       text = cell.text.replace('&nbsp;', '')
        list_of_cells.append(text)
   list_of_rows.append(list_of_cells)

print list_of_rows

outfile = open("./inmates.csv", "wb")
writer = csv.writer(outfile)
writer.writerow(["Pénznem", "Devizanév", "Egység", "Forintban kifejezett érték"])
writer.writerows(list_of_rows)

+3

python html database web-scraping beautifulsoup

tardos93 08 june 17 at 14:35

source to share

1 answer

Xonshiz · Answer 1 · 2017-07-13T13:33:24+0000

You have a problem space

in your code from line 18 for cell in row.findAll('td'):

to line 20 list_of_cells.append(text)

. Here's the fixed code:

import csv
import requests
from bs4 import BeautifulSoup

url = 'https://www.mnb.hu/arfolyamok'
response = requests.get(url)
html = response.content

soup = BeautifulSoup(html)
table = soup.find('tbody', attrs={'class': 'stripe'})

table = str(soup)
table = table.split("<tbody>")

list_of_rows = []
for row in table[1].findAll('tr')[1:]:
    list_of_cells = []
    for cell in row.findAll('td'):
        text = cell.text.replace('&nbsp;', '')
        list_of_cells.append(text)
    list_of_rows.append(list_of_cells)

print list_of_rows

outfile = open("./inmates.csv", "wb")
writer = csv.writer(outfile)
writer.writerow(["Pénznem", "Devizanév", "Egység", "Forintban kifejezett érték"])
writer.writerows(list_of_rows)

But after executing this code, you will face another problem - character encoding error. He will read " SyntaxError: Non-ASCII character '\xc3' in file testoasd.py on line 27, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

"

How to fix it? Simple enough ... add a shebang # -*- coding: utf-8 -*-

at the very top of your code (1st line). He has to fix it.

EDIT: Just noticed that you are using BeautifulSoup incorrectly and importing it incorrectly. I fixed the import before from bs4 import BeautifulSoup

, and when using BeautifulSoup you need to specify the parser as well. Thus,

soup = BeautifulSoup(html)

will become:

soup = BeautifulSoup(html, "html.parser")

How can I read these cells from html code using python web clips?

More articles: