Pandas: how to download a zip file containing multiple txt files?
I have many files zip
stored in my path
-
mypath/data1.zip
-
mypath/data2.zip
- and etc.
Each zip file contains three different files txt
. For example, data1.zip
there is:
-
data1_a.txt
-
data1_b.txt
-
data1_c.txt
I need to download datai_c.txt
from each archived file (ie data1_c.txt
, data2_c.txt
, data3_c.txt
, etc.) and combine them into a data framework.
Unfortunately I can't do it with help read_csv
because it only works with one zip file.
Any ideas how to do this? Thank!
source to share
So, you need a different code to access the zip file. Below is the modified code from the O'Reilly Python Cookbook
import zipfile
import pandas as pd
## make up some data for example
x = pd.DataFrame({"A": [1, 2], "B": [3, 4]})
x.to_csv('a.txt', sep="|", index=False)
(x * 2).to_csv('b.txt', sep="|", index=False)
with zipfile.ZipFile('zipfile.zip', 'w') as myzip:
myzip.write('a.txt')
myzip.write('b.txt')
for filename in z.namelist( ): print 'File:', filename,
insideDF = pd.read_csv(StringIO(z.read(filename)))
df = pd.concat([df, insideDF])
print df
source to share
You want to work with the library patool
like this:
import patool
import pandas as pd
compression = zipfile.ZIP_DEFLATED
patoolib.extract_archive('mypath/data1.zip', outdir='mypath', interactive=False, verbosity=-1)
save each txt file DataFrame
using read_csv
as in: df = pd.read_csv ('mypath / data1_a')
and then use pd.concat
to compose the data frames any way you want.
source to share