Using pandas read_csv with zip compression

I am trying to use read_csv

pandas to read a ZIP file from an FTP server. The zip file contains only one file as required.

Here's my code:

pd.read_csv('ftp://ftp.fec.gov/FEC/2016/cn16.zip', compression='zip')

      

I am getting this error:

AttributeError: addinfourl instance has no attribute 'seek'

      

I am getting this error in both pandas 18.1 and 19.0. Am I missing something, or could it be a bug?

+3


source to share


2 answers


While I'm not entirely sure why you are getting the error, you can work around this by opening the url with urllib2

and writing the data to an in-memory binary stream as shown. In addition, we must specify the correct delimiter, otherwise we would get another error.

import io
import urllib2 as urllib
import pandas as pd

r = urllib.urlopen('ftp://ftp.fec.gov/FEC/2016/cn16.zip')
df = pd.read_csv(io.BytesIO(r.read()), compression='zip', sep='|', header=None)

      



As for the error itself, I think pandas is trying to search the "zip file" before loading the url content (so it is not a zip file), which will result in this error.

+3


source


header = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:32.0) Gecko/20100101 Firefox/54.0.1',}
remotezip = requests.get(url, headers=header)
root = zipfile.ZipFile(io.BytesIO(remotezip.content))
for name in root.namelist():
            df = pd.read_csv(root.open(name)) 

      



Taken from my own blog post: Read zSP zip files in python pandas without downloading the zip file

0


source







All Articles