Scraper using NightmareJS with NodeJS
I am trying to clean and store the results in my database. I am using NodeJS (sails.js framework)
This is a working example using cheerio :
getRequest('some-url').then((data) => {
const $ = cheerio.load(data);
let title = $('.title').each(function (i, element) {
let a = $(this);
let title = a.text(); // Title
MyModel.create({title : title}).exec((err, event) => {
});
});
});
The problem with cheerio is that it does not act like a browser and does not display javascript-rendered web pages.
So I decided to try the js nightmare and it was a nightmare to do the same:
var articles = [];
Promise.resolve(nightmare
.goto('some-url')
.wait(0)
.inject('js', 'assets/js/dependencies/jquery-3.2.1.min.js')
.evaluate((articles) => {
var article = {};
var list = document.querySelectorAll('h3 a');
var elementArray = [...list];
elementArray.forEach(el => {
article.title = el.innerText;
articles.push(article);
myModel.create({title : article.title}).exec((err, event) => {
});
});
return articles;
}, articles)
.end())
.then((data) => {
console.log(data);
});
Problems
The news is not defined within the function evaluate()
. the scoring function seems to only accept strings and News is the model generated sails.js
.
In addition, the array of articles is filled with the same data.
Is there an easier way to clean up a web page after rendering the DOM using NodeJS?
source to share
No one has answered this question yet
Check out similar questions: