HTML web scraping for meaning
i created a python program with beautifulsoup that should find a specific value from the site, but the program doesn't seem to find the value.
import bs4
from urllib.request import urlopen as ureq
from bs4 import BeautifulSoup as soup
my_url = 'http://www.calcalist.co.il/stocks/home/0,7340,L-4135-22212222,00.html?quote=%D7%93%D7%95%D7%9C%D7%A8'
uclient = ureq(my_url)
page_html = uclient.read()
uclient.close()
page_soup = soup(page_html, "html.parser")
value = page_soup.find("td",{"class":"RightBlack"})
print(value)
the value I'm trying to find is dollar converted to israeli currency, but for some reason the line of code should get this value:
value = page_soup.find("td",{"class":"RightBlack"})
can't find it.
source to share
1. First option, what you can do using BeautifulSoup
Note that the item you want to get is inside iframe
, which means it is a different request than the one you made, you can do the code to iterate all over iframes
and print the price if it finds a iframe_soup.find("td",{"class":"RightBlack"})
.
I would recommend using the operator except
as it is easy to fall into URL traps when doing so:
from urllib.request import urlopen as ureq
from bs4 import BeautifulSoup as soup
my_url = 'http://www.calcalist.co.il/stocks/home/0,7340,L-4135-22212222,00.html?quote=%D7%93%D7%95%D7%9C%D7%A8'
uclient = ureq(my_url)
page_html = uclient.read()
page_soup = soup(page_html, "html.parser")
iframesList = page_soup.find_all('iframe')
i = 1
for iframe in iframesList:
print(i, ' out of ', len(iframesList), '...')
try:
uclient = ureq("http://www.calcalist.co.il"+iframe.attrs['src'])
iframe_soup = soup(uclient.read(), "html.parser")
price = iframe_soup.find("td",{"class":"RightBlack"})
if price:
print(price)
break
except:
print("something went wrong")
i+=1
By running the code, the outputs are:
1 out of 8 ...
2 out of 8 ...
3 out of 8 ...
4 out of 8 ...
5 out of 8 ...
<td class="RightBlack">3.5630</td>
So now we have what we want:
>>> price
<td class="RightBlack">3.5630</td>
>>> price.text
'3.5630'
2. Second option, use Selenium
This is a recommendation, to do JavaScript requests and processing, you should use the JS interpreter below, I use , but you can also use for mute browsing. By checking the frame element, we know what its id we use to get there , and then we can easily find ours : Selenium
ChromeDriver
PhantomJS
"StockQuoteIFrame"
.switch_to_frame
price
from selenium import webdriver
from bs4 import BeautifulSoup
url = 'http://www.calcalist.co.il/stocks/home/0,7340,L-4135-22212222,00.html?quote=%D7%93%D7%95%D7%9C%D7%A8'
browser = webdriver.Chrome()
browser.get(url)
browser.switch_to_frame(browser.find_element_by_id("StockQuoteIFrame"))
price = browser.find_element_by_class_name("RightBlack").text
The output is, of course, the same as the first option:
>>> price
'3.5630'
source to share