Reading csv zip files in python

I am trying to get data from a zipped CSV file. Is there a way to do this without unpacking all the files? If not, how can I unzip the files and read them efficiently?

+14


source to share


7 replies


I used the "zipfile" module to import the yo pandas zip file. let say The filename is "intfile" and the zip is "THEZIPFILE".



import pandas as pd
import zipfile

zf = zipfile.ZipFile('C:/Users/Desktop/THEZIPFILE.zip') 
df = pd.read_csv(zf.open('intfile.csv'))

      

+32


source


Yes. You need the "zipfile" module

You open the zip file itself with zipfile.ZipInfo([filename[, date_time]])



Then you can use ZipFile.infolist()

to list each file in zip and extract it withZipFile.open(name[, mode[, pwd]])

+4


source


zipfile also supports the with statement.

So, adding to yaron's answer for using pandas:

with zipfile.ZipFile('file.zip') as zip:
    with zip.open('file.csv') as myZip:
        df = pd.read_csv(myZip) 

      

+4


source


Thought Yaron had a better answer, but thought I would add some code that iterates over multiple files inside a zip folder. Then it will add the results:

import os
import pandas as pd
import zipfile

curDir = os.getcwd()
zf = zipfile.ZipFile(curDir + '/targetfolder.zip')
text_files = zf.infolist()
list_ = []

print ("Uncompressing and reading data... ")

for text_file in text_files:
    print(text_file.filename)
    df = pd.read_csv(zf.open(text_file.filename)
    # do df manipulations
    list_.append(df)

df = pd.concat(list_)

      

+2


source


If you are not using Pandas, this can be done entirely with the standard lib library. Here is Python 3.7 code:

import csv
from io import TextIOWrapper
from zipfile import ZipFile

with ZipFile('yourfile.zip') as zf:
    with zf.open('your_csv_inside_zip.csv', 'r') as infile:
        reader = csv.reader(TextIOWrapper(infile), 'UTF-8')
        for row in reader:
            # process the CSV here
            print(row)

      

+2


source


Modern Pandas since version 0.18.1 support compressed CSV files natively : the read_csv method has a compression parameter: {'infer', 'gzip', 'bz2', 'zip', 'xz', None}, default is "output".

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

+1


source


A quick solution could be with the below code!

import pandas as pd

#pandas support zip file reads
df = pd.read_csv("/path/to/file.csv.zip")

      

+1


source







All Articles