Web page source not available with urllib.urlopen ()

Question

Web page source not available with urllib.urlopen ()

I am trying to get video links from 'https://www.youtube.com/trendsdashboard#loc0=ind'

. When I inspect the elements, it shows me the original html code for each video. In the source code obtained with

urllib2.urlopen("https://www.youtube.com/trendsdashboard#loc0=ind").read()

It doesn't render the html source for the video. Are there other ways to do this?

<a href="/watch?v=dCdvyFkctOo" alt="Flipkart Wish Chain">
        <img src="//i.ytimg.com/vi/dCdvyFkctOo/hqdefault.jpg" alt="Flipkart Wish Chain">
      </a>

This simple code appears when we inspect elements from the browser, but not in the original code obtained with urllib

+3

python urllib2 beautifulsoup

nlper June 11. At 5:51 am

source to share

4 answers

Alexander mcfarlane · Answer 1 · 2015-06-11T06:07:55+0000

works for me ...

import urllib2
url = 'https://www.youtube.com/trendsdashboard#loc0=ind'
html = urllib.urlopen(url).read()

IMO I would use requests

instead urllib

- it's a little easier to use:

import requests
url = 'https://www.youtube.com/trendsdashboard#loc0=ind'
response = requests.get(url)
html = response.content

Edit

This will give you a list of all the <a></a>

hyperlinked tags as per your changes. I am using BeautifulSoup

html parsing library :

from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
links = [tag for tag in soup.findAll('a') if tag.has_attr('href')]

Ajay · Answer 2 · 2015-06-11T06:08:13+0000

To view the source, you need to use the method read

If you just use open, you get something like this.

In [12]: urllib2.urlopen('https://www.youtube.com/trendsdashboard#loc0=ind')
Out[12]: <addinfourl at 3054207052L whose fp = <socket._fileobject object at 0xb60a6f2c>>

To see source usage read

urllib2.urlopen('https://www.youtube.com/trendsdashboard#loc0=ind').read()

Vikas Ojha · Answer 3 · 2015-06-11T06:09:00+0000

Whenever you compare source code between Python code and web browser, don't do it through insect element, right click on web page and click view source, then you will find the actual source. Inspect Element displays aggregated source code returned by as many network requests as possible, as well as executable javascript code.

Open the open developer console, before opening the webpage, stay on the Networking tab and make sure Save Log is open for Chrome or Persist for Firebug in Firefox, after which you will see all network requests made.

rishav · Answer 4 · 2016-01-11T17:44:32+0000

Web page source not available with urllib.urlopen ()

More articles: