How can I read these cells from html code using python web clips?
I want to scrape exchange price information from this site and after picking it up in the database: https://www.mnb.hu/arfolyamok
I need this piece of html:
<tbody>
<tr>
<td class="valute"><b>CHF</b></td>
<td class="valutename">svájci frank</td>
<td class="unit">1</td>
<td class="value">284,38</td>
</tr>
<tr>
<td class="valute"><b>EUR</b></td>
<td class="valutename">euro</td>
<td class="unit">1</td>
<td class="value">308,54</td>
</tr>
<tr>
<td class="valute"><b>USD</b></td>
<td class="valutename">USA dollár</td>
<td class="unit">1</td>
<td class="value">273,94</td>
</tr>
</tbody>
This is why I wrote the code, but something is wrong with it. How can I fix this, where should I change it? I only want the data "valute", "valutename", "unit" and "value". I am working with Python 2.7.13 on Windows 7.
The error message is: "There is an error in your program: unindent does not match the outer indentation level."
The code is here:
import csv
import requests
from BeautifulSoup import BeautifulSoup
url = 'https://www.mnb.hu/arfolyamok'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html)
table = soup.find('tbody', attrs={'class': 'stripe'})
table = str(soup)
table = table.split("<tbody>")
list_of_rows = []
for row in table[1].findAll('tr')[1:]:
list_of_cells = []
for cell in row.findAll('td'):
text = cell.text.replace(' ', '')
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
print list_of_rows
outfile = open("./inmates.csv", "wb")
writer = csv.writer(outfile)
writer.writerow(["Pénznem", "Devizanév", "Egység", "Forintban kifejezett érték"])
writer.writerows(list_of_rows)
source to share
You have a problem space
in your code from line 18 for cell in row.findAll('td'):
to line 20 list_of_cells.append(text)
. Here's the fixed code:
import csv
import requests
from bs4 import BeautifulSoup
url = 'https://www.mnb.hu/arfolyamok'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html)
table = soup.find('tbody', attrs={'class': 'stripe'})
table = str(soup)
table = table.split("<tbody>")
list_of_rows = []
for row in table[1].findAll('tr')[1:]:
list_of_cells = []
for cell in row.findAll('td'):
text = cell.text.replace(' ', '')
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
print list_of_rows
outfile = open("./inmates.csv", "wb")
writer = csv.writer(outfile)
writer.writerow(["Pénznem", "Devizanév", "Egység", "Forintban kifejezett érték"])
writer.writerows(list_of_rows)
But after executing this code, you will face another problem - character encoding error. He will read " SyntaxError: Non-ASCII character '\xc3' in file testoasd.py on line 27, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
"
How to fix it? Simple enough ... add a shebang # -*- coding: utf-8 -*-
at the very top of your code (1st line). He has to fix it.
EDIT: Just noticed that you are using BeautifulSoup incorrectly and importing it incorrectly. I fixed the import before from bs4 import BeautifulSoup
, and when using BeautifulSoup you need to specify the parser as well. Thus,
soup = BeautifulSoup(html)
will become:
soup = BeautifulSoup(html, "html.parser")
source to share