How do I download a ZIP file from a web url using Perl Script?

I want to download the zip file that is available at http://www.nseindia.com/content/equities/cmbhav.htm by clicking Download csv file.

If you right click on "Download csv file" and select the link location to copy, then the URL pattern looks like http://www.nseindia.com/content/historical/EQUITIES/2012/MAR/cm23MAR2012bhav.csv.zip ...

I want to write a Perl Script that will download a ZIP file from a URL.

The following code doesn't work

#!/usr/bin/perl
use warnings;
use strict;
use LWP::Simple;

my $url = 'http://www.nseindia.com/content/historical/EQUITIES/2012/MAR' ;
my $file = 'cm23MAR2012bhav.csv.zip'    ;
getstore($url, $file) ;

      

+3


source to share


2 answers


If you need to change the user agent and still want to use LWP :: Simple , you can use $ua

export:

use File::Basename;
use LWP::Simple qw($ua getstore);
use URI;

my $url = URI->new( 'http://www.nseindia.com/content/historical/EQUITIES/2012/MAR/cm23MAR2012bhav.csv.zip' );

$ua->default_headers( HTTP::Headers->new(
    Accept => '*/*', 
    )
    );

$ua->agent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/534.54.16 (KHTML, like Gecko) Version/5.1.4 Safari/534.54.16");

my $rc = getstore( $url, basename( $url->path ) );
say "Result is $rc";

      

It turns out that the combination of the user agent string and the Accept header will do this. Typically, these issues boil down to making your LWP request look the same as the request sent by your browser. I use HTTPScoop to view browser transactions, but there are many programs out there that will do the same for you.



If things get even harder, I prefer Mojo :: UserAgent . It's a little easier to play with the transaction:

use File::Basename;
use Mojo::UserAgent;
use URI;

my $url = URI->new( 'http://www.nseindia.com/content/historical/EQUITIES/2012/MAR/cm23MAR2012bhav.csv.zip' );
my $file = basename( $url->path );
printf "URL: %s\nFile: %s\n", $url, $file;

my $response = Mojo::UserAgent->new->name(
    '"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/534.54.16 (KHTML, like Gecko) Version/5.1.4 Safari/534.54.16"'
    )->get( $url->as_string, { Accept => '*/*' } )->res;

open my $fh, '>', $file or die "Could not open [$file]: $!";
print $fh $response->body;
printf "Status: %d\n", $response->code;

      

+5


source


If you are using

       print getstore($url, $file);

      

you see you are getting 403 as error (forbidden).

ADD

experimenting with curl it seems that they are doing validation on the user agent, so you cannot use LWP :: Simple as you have to set the user agent as real browsers.

ADD2

the following works are in progress



#! /usr/bin/perl -w

use warnings;
use strict;

use LWP::UserAgent;
my $url = 'http://www.nseindia.com/content/historical/EQUITIES/2012/MAR/cm23MAR2012bhav.csv.zip';
#my $file = 'cm23MAR2012bhav.csv.zip';
#my $url = 'http://localhost:11000';

my $ua = LWP::UserAgent->new;
$ua->agent("Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 (FM Scene 4.6.1)");
my $req = HTTP::Request->new(GET => $url);
$req->header(Accept => "*/*");
# $req->remove_header('Connection');  # does not work
# $req->remove_header('TE');          # does not work
my $res = $ua->request($req);
if ($res->is_success)
{
    print $res->content;
}
else
{
    print $res->status_line, "\n";
}

      

TE and Connection headers are not removed by the remove_header lines as they are inserted at the protocol layer, so removing them is a different procedure (I don't know).

This is enough to make it work anyway.

( edit I had the last place in the UserAgent line, which caused the LWP to add libwww-perl

, and this is why the server gave a 403)

important note

you had to redirect the output or change your code a bit to save the content to a file. Also note that in order to download the zip, you had to provide the correct url, not the url of the page containing the zip link.

+2


source







All Articles