Scrambling the generated javascript webpage

Question

Scrambling the generated javascript webpage

I am trying to create a small Python program that fetches the Pewdiepie subscriber count, which is updated every second on socialblade to show it in the terminal. I need this data every 30 seconds.

I tried using PyQt but it is slow, I turned into a dryscrape, a little faster but doesn't work the way I want it. I just found Invader and wrote a shortcode that still has the same problem, the number returned is the one before the Javascript on the page gets executed :

from invader import Invader

url = 'https://socialblade.com/youtube/user/pewdiepie/realtime'
invader = Invader(url, js=True)

subscribers = invader.take(['#rawCount', 'text'])
print(subscribers.text)

I know this data is available through the site API , but it doesn't always work, sometimes it just redirects to this .

Is there a way to get this number after the Javascript on the page has changed the counter and not before? And which method do you think is the best? Extract it:

from the original page that returns the same number within hours?
on an API page that throws errors when using cookies in code and after a certain amount of time?

Thanks for your advice!

+3

javascript python web-scraping

Xewi 04 Aug 17 at 20:37

source to share

1 answer

Caleb gates · Answer 1 · 2017-08-04T22:46:43+0000

I have had success with dryscrape as described in the following post.

Web scraping a JavaScript page with Python

Scrambling the generated javascript webpage

More articles: