Web page source not available with urllib.urlopen ()

I am trying to get video links from 'https://www.youtube.com/trendsdashboard#loc0=ind'

. When I inspect the elements, it shows me the original html code for each video. In the source code obtained with

urllib2.urlopen("https://www.youtube.com/trendsdashboard#loc0=ind").read()

      

It doesn't render the html source for the video. Are there other ways to do this?

<a href="/watch?v=dCdvyFkctOo" alt="Flipkart Wish Chain">
        <img src="//i.ytimg.com/vi/dCdvyFkctOo/hqdefault.jpg" alt="Flipkart Wish Chain">
      </a>

      

This simple code appears when we inspect elements from the browser, but not in the original code obtained with urllib

+3


source to share


4 answers


works for me ...

import urllib2
url = 'https://www.youtube.com/trendsdashboard#loc0=ind'
html = urllib.urlopen(url).read()

      

IMO I would use requests

instead urllib

- it's a little easier to use:

import requests
url = 'https://www.youtube.com/trendsdashboard#loc0=ind'
response = requests.get(url)
html = response.content

      



Edit

This will give you a list of all the <a></a>

hyperlinked tags as per your changes. I am using BeautifulSoup

html parsing library :

from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
links = [tag for tag in soup.findAll('a') if tag.has_attr('href')]

      

+1


source


To view the source, you need to use the method read

If you just use open, you get something like this.

In [12]: urllib2.urlopen('https://www.youtube.com/trendsdashboard#loc0=ind')
Out[12]: <addinfourl at 3054207052L whose fp = <socket._fileobject object at 0xb60a6f2c>>

      



To see source usage read

urllib2.urlopen('https://www.youtube.com/trendsdashboard#loc0=ind').read()

      

+1


source


Whenever you compare source code between Python code and web browser, don't do it through insect element, right click on web page and click view source, then you will find the actual source. Inspect Element displays aggregated source code returned by as many network requests as possible, as well as executable javascript code.

Open the open developer console, before opening the webpage, stay on the Networking tab and make sure Save Log is open for Chrome or Persist for Firebug in Firefox, after which you will see all network requests made.

+1


source


we also need to decode the data to utf-8. here is the code:

just use response.decode ('UTF-8') print (response)

0


source







All Articles