Using pandas read_csv with zip compression
I am trying to use read_csv
pandas to read a ZIP file from an FTP server. The zip file contains only one file as required.
Here's my code:
pd.read_csv('ftp://ftp.fec.gov/FEC/2016/cn16.zip', compression='zip')
I am getting this error:
AttributeError: addinfourl instance has no attribute 'seek'
I am getting this error in both pandas 18.1 and 19.0. Am I missing something, or could it be a bug?
source to share
While I'm not entirely sure why you are getting the error, you can work around this by opening the url with urllib2
and writing the data to an in-memory binary stream as shown. In addition, we must specify the correct delimiter, otherwise we would get another error.
import io
import urllib2 as urllib
import pandas as pd
r = urllib.urlopen('ftp://ftp.fec.gov/FEC/2016/cn16.zip')
df = pd.read_csv(io.BytesIO(r.read()), compression='zip', sep='|', header=None)
As for the error itself, I think pandas is trying to search the "zip file" before loading the url content (so it is not a zip file), which will result in this error.
source to share
header = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:32.0) Gecko/20100101 Firefox/54.0.1',} remotezip = requests.get(url, headers=header) root = zipfile.ZipFile(io.BytesIO(remotezip.content)) for name in root.namelist(): df = pd.read_csv(root.open(name))
Taken from my own blog post: Read zSP zip files in python pandas without downloading the zip file
source to share