Scrambling the generated javascript webpage

I am trying to create a small Python program that fetches the Pewdiepie subscriber count, which is updated every second on socialblade to show it in the terminal. I need this data every 30 seconds.

I tried using PyQt but it is slow, I turned into a dryscrape, a little faster but doesn't work the way I want it. I just found Invader and wrote a shortcode that still has the same problem, the number returned is the one before the Javascript on the page gets executed :

from invader import Invader

url = 'https://socialblade.com/youtube/user/pewdiepie/realtime'
invader = Invader(url, js=True)

subscribers = invader.take(['#rawCount', 'text'])
print(subscribers.text)

      

I know this data is available through the site API , but it doesn't always work, sometimes it just redirects to this .

Is there a way to get this number after the Javascript on the page has changed the counter and not before? And which method do you think is the best? Extract it:

  • from the original page that returns the same number within hours?
  • on an API page that throws errors when using cookies in code and after a certain amount of time?

Thanks for your advice!

+3


source to share


1 answer


I have had success with dryscrape as described in the following post.



Web scraping a JavaScript page with Python

0


source







All Articles