Nightmare, PhantomJS and page data extraction

I am new to Nightmare / PhantomJS and am struggling to get a simple list of all the tags on a given page. I am working on Ubuntu 14.04 after building PhantomJS from source and installing NodeJS, Nightmare, etc. Manually and the other functions work as I expect.

Here is the code I'm using:

var Nightmare = require('nightmare');
new Nightmare()
  .goto("http://www.google.com")
  .wait()
  .evaluate(function () 
   {
     var a = document.getElementsByTagName("*");
     return(a);
   }, 
   function(i) 
   {
     for (var index = 0; index < i.length; index++)
     if (i[index])
        console.log("Element " + index + ": " + i[index].nodeName);
    })
  .run(function(err, nightmare) 
  {
     if (err) 
        console.log(err);
  }); 

      

When I run this inside a "real" browser, I get a list of all types of tags on the page (HTML, HEAD, BODY, ...). When I run this with node GetTags.js I only get one line of output:

Element 0: HTML

      

I'm sure this is a newbie problem, but what am I doing wrong here?

+3


source to share


1 answer


PhantomJS has two contexts. The page context that provides access to the DOM is only accessible through evaluate()

. Thus, variables must be explicitly passed into the page context and outside the page context. But there is a limitation ( docs ):

Note. The arguments and return value of the function evaluate

must be a simple primitive object. Rule of thumb: If it can be serialized via JSON, then that's okay.

Closures, functions, DOM nodes, etc. won't work!



The Nightmare function evaluate()

is just a wrapper around the PhantomJS function of the same name. This means that you will need to work with the elements in the context of the page and only pass the view outside. For example:

.evaluate(function () 
{
    var a = document.getElementsByTagName("div");
    return a.length;
}, 
function(i) 
{
    console.log(i + " divs available");
})

      

+3


source







All Articles