Node.js scraping with chrome remote interface
I am trying to clean up a site secured by Distil Networks where using selenium (with Python) will always fail.
I did some searches and I came to the conclusion that the site can detect that you are using Selenium using some kind of javascript. Then I took the loot in chrome-remote-interface
as if that's what I want, but then I got stuck.
What I would like to do is automate the following steps:
- Open a Chrome instance
- Go to the page
- Run some javascript
- Collecting data and saving to file
- Repeat steps 2 - 4
I know that I can open a Chrome instance for debugging:
google-chrome --remote-debugging-port=9222
And I can open a console on node with:
chrome-remote-interface -t 127.0.0.1 -p 9222 inspect -r
I can also run simple scripts like
Page.navigate({url:"https://google.com"})
Runtime.evaluate({expression:"1+1"})
But, how can I not get the DOM directly on Node.js, like what I could do on the Chrome Developer Tools console. Basically, I want to run scripts on node, like what I could do in the Chrome Developer Tools console.
In addition, there is chrome-remote-interface
not enough documentation for scrambling. Are there any good links for this?
source to share
The JavaScript expressions evaluated Runtime.evaluate
are executed in the context of the page, just like what happens in the DevTools console.
You can interact with the DOM using DOM
, for example DOM.getDocument
, DOM.querySelector
etc.
Also remember that chrome-remote-interface
- it's basically a library meaning it allows you to create your own Node.js applications chrome-remote-interface inspect
- it's just a utility.
There are several places where you can get help:
- open issue for chrome-remote-interface ;
- chrome wiki remote interface ;
- Chrome DevTools notification viewer ;
- Google Chrome debug protocol .
If you ask something more specific, I would be happy to help you with that.
Finally, you can take a look at automated-chrome-profiling
which I think is structurally similar to what you are trying to achieve.
source to share