R httr download files with ftp error 421 "too many connections from your internet address"

EDIT - short question : does it have a httr

finalizer that closes the FTP connection?

I am downloading climate forecast files from the ftp server of the NASA NEX project using the package httr

.

My script:

library(httr)

var = c("pr", "tasmin", "tasmax")
rcp = c("rcp45", "rcp85")
mod= c("inmcm4", "GFDL-CM3")
year=c(seq(2040,2080,1))

for (v in var) {
  for (r in rcp) {
    url<- paste0( 'ftp://ftp.nccs.nasa.gov/BCSD/', r, '/day/atmos/', v, '/r1i1p1/v1.0/', sep='')
    for (m in mod) {
  for (y in year) {
    nfile<- paste0(v,'_day_BCSD_',r,"_r1i1p1_",m,'_',y,'.nc', sep='')
    url1<- paste0(url,nfile, sep='')
    destfile<-paste0('mypath',r,'/',v,'/',nfile, sep='')
    GET(url=url1, authenticate(user='NEXGDDP', password='', type = "basic"), write_disk(path=destfile, overwrite = FALSE ))
    Sys.sleep(0.5)
  }}}}

      

After a while the server will terminate my connection with the following error: " 421 Too many connections from your Internet address .

I read here that it has to do with the number of open connections and that I have to close them on each iteration (I'm not sure if this really makes sense tho!). Is there a way to close ftp with a package httr

?

+3


source to share


2 answers


Suggested solution (final answer)

The suggested solution is to set the maximum number of connections to ftp server for httr

> config(CURLOPT_MAXCONNECTS=5)
<request>
Options:
* CURLOPT_MAXCONNECTS: 5

      


Description

Preamble:

The package httr

is a wrapper for curl

. This is important because it abstracts the curl interface. In this case, we want to change the behavior curl

by changing the curl configuration with an abstraction httr

.

  • httr

    by default handles auto-sharing between requests to the same website (by default the handle controls hang automatically), cookies are supported across requests, and modern root-level certificate store is also used.

In this context, we do not control the FTP server, only the client's request to the server. Hence, we can change the default behavior with httr:config

to reduce the number of concurrent FTP requests.

Query httr curl ftp options

To get the current parameters, we can run the following command:

>httr_options("ftp")
                       httr                         libcurl    type
49              ftp_account             CURLOPT_FTP_ACCOUNT  string
50  ftp_alternative_to_user CURLOPT_FTP_ALTERNATIVE_TO_USER  string
51  ftp_create_missing_dirs CURLOPT_FTP_CREATE_MISSING_DIRS integer
52           ftp_filemethod          CURLOPT_FTP_FILEMETHOD integer
53     ftp_response_timeout    CURLOPT_FTP_RESPONSE_TIMEOUT integer
54         ftp_skip_pasv_ip        CURLOPT_FTP_SKIP_PASV_IP integer
55              ftp_ssl_ccc             CURLOPT_FTP_SSL_CCC integer
56             ftp_use_eprt            CURLOPT_FTP_USE_EPRT integer
57             ftp_use_epsv            CURLOPT_FTP_USE_EPSV integer
58             ftp_use_pret            CURLOPT_FTP_USE_PRET integer
59                  ftpport                 CURLOPT_FTPPORT  string
60               ftpsslauth              CURLOPT_FTPSSLAUTH integer
196            tftp_blksize            CURLOPT_TFTP_BLKSIZE integer 

      

to access libcurl documentation that we can call curl_docs("CURLOPT_FTP_ACCOUNT")

.

Changing httr

Query Configuration



You can either change the global curl config httr

using set_config()

, or just wrap your request using with_config()

. In this case, we want to limit the maximum number of connections to the ftp server.

in the following way:

httr_options("max")
                    httr                      libcurl    type
95  max_recv_speed_large CURLOPT_MAX_RECV_SPEED_LARGE  number
96  max_send_speed_large CURLOPT_MAX_SEND_SPEED_LARGE  number
97           maxconnects          CURLOPT_MAXCONNECTS integer
98           maxfilesize          CURLOPT_MAXFILESIZE integer
99     maxfilesize_large    CURLOPT_MAXFILESIZE_LARGE  number
100            maxredirs            CURLOPT_MAXREDIRS integer 

      

Now we can search curl_docs("CURLOPT_MAXCONNECTS")

- ok, this is what we want.

Now we have to install it.

> config(CURLOPT_MAXCONNECTS=5)
<request>
Options:
* CURLOPT_MAXCONNECTS: 5

      

ref: https://cran.r-project.org/web/packages/httr/httr.pdf


Alternative RCurl approach

I know this is a little overkill, I included it for an alternative approach. What for? There is a subtle issue here due to network bandwidth ... Starting multiple concurrent FTP sessions can be slower than starting them in sequence. My alternative approach would be to run the R script below, or go directly to using curl via the Unix shell command line.

require(RCurl)
require(stringr)
opts = curlOptions(userpwd = "NEXGDDP:", netrc = TRUE)

rcpDir  = c("rcp45", "rcp85")
varDir  = c("pr", "tasmin", "tasmax")

for (rcp in rcpDir ) {
  for (var in varDir ) {
    url <- paste0( 'ftp://ftp.nccs.nasa.gov/BCSD/', rcp, '/day/atmos/', var, '/r1i1p1/v1.0/', sep = '')
    print(url)
    filenames = getURL(url, ftp.use.epsv = FALSE, dirlistonly = TRUE, .opts = opts)
    filelist <- unlist(str_split(filenames, "\n"))
    filelist <- filelist[!filelist == ""]
    filesavg <- str_detect(filelist,
                          "inmcm4_20[4-8]0|GFDL-CM3_20[4-8]0")
    filesavg <- filelist[filesavg]
    filesavg
    urlsavg <- str_c(url, filesavg)

    for (file in seq_along(urlsavg)) {
      fname <- str_c("data/", filesavg[file])
      if (!file.exists(fname)) {
        print(urlsavg[file])
        bin <- getBinaryURL(urlsavg[file], .opts = opts)
        writeBin(bin, fname)
        Sys.sleep(1)
      }
    }
  }
}

      

Code output

> require(RCurl)
> require(stringr)
> opts = curlOptions(userpwd = "NEXGDDP:", netrc = TRUE)
> rcpDir  = c("rcp45", "rcp85")
> varDir  = c("pr", "tasmin", "tasmax")
> for (rcp in rcpDir ) {
+   for (var in varDir ) {
+     url <- paste0( 'ftp://ftp.nccs.nasa.gov/BCSD/', rcp, '/day/atmos/', var, '/r1i1p1/v1.0/', sep = '')
+     print(url)
+     filenames = getURL(url, ftp.use.epsv = FALSE, dirlistonly = TRUE, .opts = opts)
+     filelist <- unlist(str_split(filenames, "\n"))
+     filelist <- filelist[!filelist == ""]
+     filesavg <- str_detect(filelist,
+                           "inmcm4_20[4-8]0|GFDL-CM3_20[4-8]0")
+     filesavg <- filelist[filesavg]
+     filesavg
+     urlsavg <- str_c(url, filesavg)
+ 
+     for (file in seq_along(urlsavg)) {
+       fname <- str_c("data/", filesavg[file])
+       if (!file.exists(fname)) {
+         print(urlsavg[file])
+         bin <- getBinaryURL(urlsavg[file], .opts = opts)
+         writeBin(bin, fname)
+         Sys.sleep(1)
+       }
+     }
+   }
+ }
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/pr/r1i1p1/v1.0/"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp45_r1i1p1_GFDL-CM3_2040.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp45_r1i1p1_GFDL-CM3_2050.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp45_r1i1p1_GFDL-CM3_2060.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp45_r1i1p1_GFDL-CM3_2070.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp45_r1i1p1_GFDL-CM3_2080.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp45_r1i1p1_inmcm4_2050.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp45_r1i1p1_inmcm4_2060.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp45_r1i1p1_inmcm4_2070.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp45_r1i1p1_inmcm4_2080.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmin/r1i1p1/v1.0/"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp45_r1i1p1_GFDL-CM3_2040.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp45_r1i1p1_GFDL-CM3_2050.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp45_r1i1p1_GFDL-CM3_2060.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp45_r1i1p1_GFDL-CM3_2070.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp45_r1i1p1_GFDL-CM3_2080.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp45_r1i1p1_inmcm4_2040.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp45_r1i1p1_inmcm4_2050.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp45_r1i1p1_inmcm4_2060.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp45_r1i1p1_inmcm4_2070.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp45_r1i1p1_inmcm4_2080.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmax/r1i1p1/v1.0/"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp45_r1i1p1_GFDL-CM3_2040.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp45_r1i1p1_GFDL-CM3_2050.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp45_r1i1p1_GFDL-CM3_2060.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp45_r1i1p1_GFDL-CM3_2070.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp45_r1i1p1_GFDL-CM3_2080.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp45_r1i1p1_inmcm4_2040.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp45_r1i1p1_inmcm4_2050.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp45_r1i1p1_inmcm4_2060.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp45_r1i1p1_inmcm4_2070.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp45/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp45_r1i1p1_inmcm4_2080.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/pr/r1i1p1/v1.0/"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp85_r1i1p1_GFDL-CM3_2040.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp85_r1i1p1_GFDL-CM3_2050.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp85_r1i1p1_GFDL-CM3_2060.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp85_r1i1p1_GFDL-CM3_2070.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp85_r1i1p1_GFDL-CM3_2080.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp85_r1i1p1_inmcm4_2040.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp85_r1i1p1_inmcm4_2050.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp85_r1i1p1_inmcm4_2060.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp85_r1i1p1_inmcm4_2070.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/pr/r1i1p1/v1.0/pr_day_BCSD_rcp85_r1i1p1_inmcm4_2080.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmin/r1i1p1/v1.0/"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp85_r1i1p1_GFDL-CM3_2040.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp85_r1i1p1_GFDL-CM3_2050.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp85_r1i1p1_GFDL-CM3_2060.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp85_r1i1p1_GFDL-CM3_2070.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp85_r1i1p1_GFDL-CM3_2080.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp85_r1i1p1_inmcm4_2040.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp85_r1i1p1_inmcm4_2050.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp85_r1i1p1_inmcm4_2060.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp85_r1i1p1_inmcm4_2070.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmin/r1i1p1/v1.0/tasmin_day_BCSD_rcp85_r1i1p1_inmcm4_2080.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmax/r1i1p1/v1.0/"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp85_r1i1p1_GFDL-CM3_2040.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp85_r1i1p1_GFDL-CM3_2050.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp85_r1i1p1_GFDL-CM3_2060.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp85_r1i1p1_GFDL-CM3_2070.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp85_r1i1p1_GFDL-CM3_2080.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp85_r1i1p1_inmcm4_2040.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp85_r1i1p1_inmcm4_2050.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp85_r1i1p1_inmcm4_2060.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp85_r1i1p1_inmcm4_2070.nc"
[1] "ftp://ftp.nccs.nasa.gov/BCSD/rcp85/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp85_r1i1p1_inmcm4_2080.nc"

      

+2


source


(Not sure if this should be the answer, but I can't add the whole thing to the comment)

To summarize, two alternative solutions worked, combining my approach with the one suggested by Technophobe. I've put the final code here as well in case it might be helpful for those experiencing the same problems.

httr

:

library(httr)
#configure a proxy, in case you are in a office/university network
set_config(use_proxy(url='http://~in_case_you_need_a_proxy', port=paste_here_port_no)
#limit the number of simultaneous connections as suggested by Technofobe
#default is 5
config(CURLOPT_MAXCONNECTS=3)

var = c("pr","tasmax","tasmin")
rcp = c("rcp45", "rcp85")
mod= c("inmcm4", "GFDL-CM3")
year=c(seq(2036,2050,1), seq(2061,2080,1))
for (v in var) {
  for (r in rcp) {
  url<- paste0( 'ftp://ftp.nccs.nasa.gov/BCSD/', r, '/day/atmos/', v, '/r1i1p1/v1.0/', sep='')
    for (m in mod) {
      for (y in year) {
    nfile<- paste0(v,'_day_BCSD_',r,"_r1i1p1_",m,'_',y,'.nc', sep='')
    url1<- paste0(url,nfile, sep='')
    destfile<-paste0('D:/destination_path/',r,'/',v,'/',nfile, sep='')
    GET(url=url1, authenticate(user='NEXGDDP', password='', type = "basic"), write_disk(path=destfile, overwrite = FALSE ))
    gc()
    Sys.sleep(1)
}}}}

      



An alternative approach using RCurl

library(RCurl)
opts = curlOptions(proxy='http://~in_case_you_need_a_proxy:paste_here_port_no', userpwd = "NEXGDDP:", netrc = TRUE)

    var = c("pr","tasmax","tasmin")
rcp = c("rcp45", "rcp85")
mod= c("inmcm4", "GFDL-CM3")
year=c(seq(2036,2050,1), seq(2061,2080,1))
for (v in var) {
  for (r in rcp) {
  url<- paste0( 'ftp://ftp.nccs.nasa.gov/BCSD/', r, '/day/atmos/', v, '/r1i1p1/v1.0/', sep='')
    for (m in mod) {
      for (y in year) {
    nfile<- paste0(v,'_day_BCSD_',r,"_r1i1p1_",m,'_',y,'.nc', sep='')
    url1<- paste0(url,nfile, sep='')
    destfile<-paste0('D:/destination_path/',r,'/',v,'/',nfile, sep='')
    bin <- getBinaryURL(url1, .opts = opts)
    writeBin(bin, destfile)
    Sys.sleep(1)
    gc()
  }}}}

      

Both approaches have been tested and processed. The second may still be affected by error 421, but in a very limited number of cases (I've uploaded over 900 files for a total of about 600GB). Hopefully this is a good recommendation for other people working in the field.

+1


source







All Articles