How to handle downloads with phantomjs / casperjs?
Is it possible to upload a file to a folder and give it a specific name using panthomjs / casperjs?
For example, how do I download the CSV at the bottom of this page: http://www.nasdaq.com/symbol/aapl/historical and name it aapl.txt?
Download link:
<a href="javascript:getQuotes(true);" id="lnkDownLoad">
Download this file in Excel Format
</a>
Its purpose is to call a javascript function whose purpose is to obfuscate the direct download link (I think), but when you click on it, it invokes the classic download prompt. I would like phantomjs to handle this upload normally (to change the filename and choose where to save it to disk)
Edit: This code should click on the download link and listen for incoming resources:
var casper = require('casper').create();
var x = require('casper').selectXPath;
casper.userAgent("Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2049.0 Safari/537.36")
casper.start('http://www.nasdaq.com/symbol/aapl/historical', function () {
//this.echo(this.getTitle());
console.log('TITLE : ' + this.getTitle());
});
casper.wait(5000, function() {
casper.on('resource.received', function (resource) {
casper.echo("LISTENING");
casper.echo(resource.url);
});
});
casper.thenClick(x('//*[@id="lnkDownLoad"]'), function() {
console.log('CLICKED');
});
casper.run();
But for some reason, I am not getting any file, unlike a normal browser. Console log:b'TITLE : (AAPL) Historical Prices & Data - NASDAQ.com\r\nCLICKED\r\nLISTENING\r\nhttp://www.nasdaq.com/symbol/aapl/historical\r\n'
Any idea?
source to share
When you look at the code, you can see that it is really not confusing. Clicking on the download link downloads the file via casper but cannot be easily accessed with. The culprit is PhantomJS because PhantomJS does not expose the content of the request and response (see page.onResourceReceived
), but only the metadata.
This means that you need to repeat the request using a function download
. When you view the page source in your browser's developer tools, you see what getQuotes(true)
is called when clicked. Searching for getQuotes
( Ctrl+ Shift+ Fin Chrome) you will find this feature.
By analyzing this function, you might come to the conclusion that $("#getFile").submit();
this is the actual upload, which is just a POST request from a form with a lot of hidden values. If you look closely at getQuotes
, you can see that the function also adds one of the hidden values to the form. This means that you need to call getQuotes
before you start styling the form.
The actual counterfeiting is relatively straightforward. First of all, you need to generate a request object from the form that will be used in the POST request, and second, you need to figure out the URL of the request. Here's the complete code:
var casper = require('casper').create();
var x = require('casper').selectXPath;
casper.userAgent("Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2049.0 Safari/537.36")
casper.start('http://www.nasdaq.com/symbol/aapl/historical');
casper.wait(5000); // probably not necessary
casper.thenClick('#lnkDownLoad');
casper.then(function(){
var parameters = this.evaluate(function(){
// from http://stackoverflow.com/a/2403206
var paramObj = {};
$.each($('#getFile').serializeArray(), function(_, kv) {
paramObj[kv.name] = kv.value;
});
return paramObj;
});
var url = this.getElementAttribute('#getFile', 'action');
this.download(url, 'aapl.csv', 'POST', parameters);
});
casper.run();
source to share