Dump HTML page including iframes
I would like to dump the content of the HTML page in the web page, including the HTML frames included inside the elements <iframe>
. The Elements tab of the Chrome developer tools allows you to render an iframe this way.
When I say "dump HTML content" I am interested in browser automation tools like Selenium or PhantomJS. Do any of these tools have this capacity built in?
For example, the HTML dump that I would like on this page should include the HTML source of this inline page .
source to share
You can use phantomjs to achieve this
Here is a piece of code from the phantom js server code.
var system = require('system');
var url = system.args[1] || '';
if(url.length > 0) {
var page = require('webpage').create();
page.open(url, function (status) {
if (status == 'success') {
var delay, checker = (function() {
var html = page.evaluate(function () {
var body = document.getElementsByTagName('body')[0];
if(body.getAttribute('data-status') == 'ready') {
return document.getElementsByTagName('html')[0].outerHTML;
}
});
if(html) {
clearTimeout(delay);
console.log(html);
phantom.exit();
}
});
delay = setInterval(checker, 100);
}
});
}
on the html you use the "data status" attribute to let phantomjs know when the page is ready, if the html is yours. Another option is to use a good timeout if the html page doesn't belong to you.
source to share