Scraper using NightmareJS with NodeJS

I am trying to clean and store the results in my database. I am using NodeJS (sails.js framework)

This is a working example using cheerio :

getRequest('some-url').then((data) => {
    const $ = cheerio.load(data);
    let title = $('.title').each(function (i, element) {
        let a = $(this);
        let title = a.text(); // Title
        MyModel.create({title : title}).exec((err, event) => {
        });
    });
});

      

The problem with cheerio is that it does not act like a browser and does not display javascript-rendered web pages.

So I decided to try the js nightmare and it was a nightmare to do the same:

var articles = [];
Promise.resolve(nightmare
    .goto('some-url')
    .wait(0)
    .inject('js', 'assets/js/dependencies/jquery-3.2.1.min.js')
    .evaluate((articles) => {
        var article = {};
        var list = document.querySelectorAll('h3 a');
        var elementArray = [...list];
        elementArray.forEach(el => {
            article.title = el.innerText;
            articles.push(article);
            myModel.create({title : article.title}).exec((err, event) => {
            });
        });
        return articles;
    }, articles)
    .end())
    .then((data) => {
        console.log(data);
    });

      

Problems

The news is not defined within the function evaluate()

. the scoring function seems to only accept strings and News is the model generated sails.js

.

In addition, the array of articles is filled with the same data.

Is there an easier way to clean up a web page after rendering the DOM using NodeJS?

+3


source to share





All Articles