Pandas: how to download a zip file containing multiple txt files?

I have many files zip

stored in my path

  • mypath/data1.zip

  • mypath/data2.zip

  • and etc.

Each zip file contains three different files txt

. For example, data1.zip

there is:

  • data1_a.txt

  • data1_b.txt

  • data1_c.txt

I need to download datai_c.txt

from each archived file (ie data1_c.txt

, data2_c.txt

, data3_c.txt

, etc.) and combine them into a data framework.

Unfortunately I can't do it with help read_csv

because it only works with one zip file.

Any ideas how to do this? Thank!

+3


source to share


2 answers


So, you need a different code to access the zip file. Below is the modified code from the O'Reilly Python Cookbook



import zipfile
import pandas as pd
## make up some data for example
x = pd.DataFrame({"A": [1, 2], "B": [3, 4]}) 
x.to_csv('a.txt', sep="|", index=False) 
(x * 2).to_csv('b.txt', sep="|", index=False)

with zipfile.ZipFile('zipfile.zip', 'w') as myzip:
    myzip.write('a.txt')
    myzip.write('b.txt')
    for filename in z.namelist( ): print 'File:', filename,
         insideDF = pd.read_csv(StringIO(z.read(filename)))
         df = pd.concat([df, insideDF])
print df

      

+4


source


You want to work with the library patool

like this:

import patool
import pandas as pd
compression = zipfile.ZIP_DEFLATED
patoolib.extract_archive('mypath/data1.zip', outdir='mypath', interactive=False, verbosity=-1)

      



save each txt file DataFrame

using read_csv

as in: df = pd.read_csv ('mypath / data1_a')

and then use pd.concat

to compose the data frames any way you want.

+1


source







All Articles