Get html generated by Javascript using PhantomJS
I am trying to use PhantomJS to get html generated by a dynamic page. I assumed it would be easy, but after hours of trying, I still had no luck.
The page itself has this source code and which ends up being saved to 1.html:
<!doctype html>
<html lang="cs" ng-app="appId">
<head ng-controller="MainCtrl">
(ommited some lines)
<script src="/js/conf/config.js?pars"></script>
<script src="/js/all.js?pars"></script>
</head>
<body>
<!--<![endif]-->
<div site-loader></div>
<div page-layout>
<div ng-view></div>
</div>
</body>
</html>
All web content is loaded inside the site loader div, but I'm out of luck though I'm using a timeout before clearing the html from PhantomJS. Here is the code I'm using:
var url = 'http:...';
var page = require('webpage').create();
var fs = require('fs');
page.open(url, function (status) {
if (status !== 'success') {
console.log('Fail');
phantom.exit();
} else {
window.setTimeout(function () {
fs.write('1.html', page.content, 'w');
phantom.exit();
}, 2000); // Change timeout as required to allow sufficient time
}
});
Please, what am I doing wrong?
EDIT: I decided to try the PJscrapper framework and set it up to clone the entire content of the div block. Everything I got was disgusting:
["","\n\t\tif (window.DOT) {\n\t\t\tDOT.cfg({service: 'sreality', impress: false});\n\t\t}\n\t","","Loader.load()","",""]
It seems that I seriously don't get it and always get the code before Loader.load () takes effect. And obviously a timeout doesn't solve the problem.
+3
source to share
1 answer
This will do the trick
page.open(url, function (status) {
if (status !== 'success') {
console.log('Unable to load the url!');
phantom.exit();
} else {
window.setTimeout(function () {
var results = page.evaluate(function() {
return document.documentElement.innerHTML;
});
console.log(results)
phantom.exit();
}, 200);
}
});
+1
source to share