HTML web scraping for meaning

i created a python program with beautifulsoup that should find a specific value from the site, but the program doesn't seem to find the value.

import bs4
from urllib.request import urlopen as ureq
from bs4 import BeautifulSoup as soup
my_url = 'http://www.calcalist.co.il/stocks/home/0,7340,L-4135-22212222,00.html?quote=%D7%93%D7%95%D7%9C%D7%A8'
uclient = ureq(my_url)
page_html = uclient.read()
uclient.close()
page_soup = soup(page_html, "html.parser")
value = page_soup.find("td",{"class":"RightBlack"})
print(value)

      

the value I'm trying to find is dollar converted to israeli currency, but for some reason the line of code should get this value:

value = page_soup.find("td",{"class":"RightBlack"})

      

can't find it.

+3


source to share


1 answer


1. First option, what you can do using BeautifulSoup

Note that the item you want to get is inside iframe

, which means it is a different request than the one you made, you can do the code to iterate all over iframes

and print the price if it finds a iframe_soup.find("td",{"class":"RightBlack"})

.

I would recommend using the operator except

as it is easy to fall into URL traps when doing so:

from urllib.request import urlopen as ureq
from bs4 import BeautifulSoup as soup

my_url = 'http://www.calcalist.co.il/stocks/home/0,7340,L-4135-22212222,00.html?quote=%D7%93%D7%95%D7%9C%D7%A8'
uclient = ureq(my_url)
page_html = uclient.read()
page_soup = soup(page_html, "html.parser")

iframesList = page_soup.find_all('iframe')
i = 1
for iframe in iframesList:
    print(i, ' out of ', len(iframesList), '...')
    try:
        uclient = ureq("http://www.calcalist.co.il"+iframe.attrs['src'])
        iframe_soup = soup(uclient.read(), "html.parser")
        price = iframe_soup.find("td",{"class":"RightBlack"})
        if price:
            print(price)
            break
    except:
        print("something went wrong")
    i+=1

      

By running the code, the outputs are:

1  out of  8 ...
2  out of  8 ...
3  out of  8 ...
4  out of  8 ...
5  out of  8 ...
<td class="RightBlack">3.5630</td>

      

So now we have what we want:



>>> price
<td class="RightBlack">3.5630</td>
>>> price.text
'3.5630'

      


2. Second option, use Selenium

This is a recommendation, to do JavaScript requests and processing, you should use the JS interpreter below, I use , but you can also use for mute browsing. By checking the frame element, we know what its id we use to get there , and then we can easily find ours : Selenium

ChromeDriver

PhantomJS

"StockQuoteIFrame"

.switch_to_frame

price

from selenium import webdriver
from bs4 import BeautifulSoup

url = 'http://www.calcalist.co.il/stocks/home/0,7340,L-4135-22212222,00.html?quote=%D7%93%D7%95%D7%9C%D7%A8'

browser = webdriver.Chrome()
browser.get(url)

browser.switch_to_frame(browser.find_element_by_id("StockQuoteIFrame"))
price = browser.find_element_by_class_name("RightBlack").text

      

The output is, of course, the same as the first option:

>>> price
'3.5630'

      

+2


source







All Articles