Scrambling the generated javascript webpage
I am trying to create a small Python program that fetches the Pewdiepie subscriber count, which is updated every second on socialblade to show it in the terminal. I need this data every 30 seconds.
I tried using PyQt but it is slow, I turned into a dryscrape, a little faster but doesn't work the way I want it. I just found Invader and wrote a shortcode that still has the same problem, the number returned is the one before the Javascript on the page gets executed :
from invader import Invader
url = 'https://socialblade.com/youtube/user/pewdiepie/realtime'
invader = Invader(url, js=True)
subscribers = invader.take(['#rawCount', 'text'])
print(subscribers.text)
I know this data is available through the site API , but it doesn't always work, sometimes it just redirects to this .
Is there a way to get this number after the Javascript on the page has changed the counter and not before? And which method do you think is the best? Extract it:
- from the original page that returns the same number within hours?
- on an API page that throws errors when using cookies in code and after a certain amount of time?
Thanks for your advice!
source to share