Unpack a file with Python and return all directories it creates
How can I unzip a file .zip
from Python to some directory output_dir
and get a list of all directories made when unzipping as a result? For example, if I have:
unzip('myzip.zip', 'outdir')
outdir
is a directory that may contain other files / directories. When I unzip myzip.zip
into it, I would like to be able to unzip
return all directories created in outdir/
as a result of zipping. Here is my code:
import zipfile
def unzip(zip_file, outdir):
"""
Unzip a given 'zip_file' into the output directory 'outdir'.
"""
zf = zipfile.ZipFile(zip_file, "r")
zf.extractall(outdir)
How can I make it unzip
return the ones created by it in the outdir
dir? thank.
Edit: The solution that matters the most to me is to get ONLY the top-level directories in the zip file and then recursively go through them, which will ensure that I get all the files made by the zip. Is it possible? The systemic behavior of namelist makes it impossible to rely on
source to share
You can read the contents of the zip file using the method namelist()
. Directories will have a path separator:
>>> import zipfile
>>> zip = zipfile.ZipFile('test.zip')
>>> zip.namelist()
['dir2/', 'file1']
You can do this before or after retrieving the content.
Depending on your operating environment, the result namelist()
may be limited to the top-level paths of the zip archive (e.g. Python on Linux), or it may cover the entire contents of the archive (e.g. IronPython on Windows).
namelist()
returns a complete list of the contents of a zip archive, with directories marked with a path separator. For example, a zip archive with the following file structure:
./file1 ./dir2 ./dir2/dir21 ./dir3 ./dir3/file3 ./dir3/dir31 ./dir3/dir31/file31
returns the following list returned zipfile.ZipFile.namelist()
:
[ 'file1', 'dir2/', 'dir2/dir21/', 'dir3/', 'dir3/file3', 'dir3/dir31/', 'dir3/dir31/file31' ]
source to share
ZipFile.namelist
will return a list of the names of the elements in the archive. However, these names will only be fully qualified filenames, including their directory path. (A zip file can only contain files, not directories, so directories are implied by the names of the archive items.) To identify the directories that are created, you need a list of all directories implicitly created by each file.
The function dirs_in_zip()
below will do this and collect all dir names into a set.
import zipfile
import os
def parent_dirs(pathname, subdirs=None):
"""Return a set of all individual directories contained in a pathname
For example, if 'a/b/c.ext' is the path to the file 'c.ext':
a/b/c.ext -> set(['a','a/b'])
"""
if subdirs is None:
subdirs = set()
parent = os.path.dirname(pathname)
if parent:
subdirs.add(parent)
parent_dirs(parent, subdirs)
return subdirs
def dirs_in_zip(zf):
"""Return a list of directories that would be created by the ZipFile zf"""
alldirs = set()
for fn in zf.namelist():
alldirs.update(parent_dirs(fn))
return alldirs
zf = zipfile.ZipFile(zipfilename, 'r')
print(dirs_in_zip(zf))
source to share
Let him finish and then read the contents of the directory - this is a good example of this.
source to share