Unpack a file with Python and return all directories it creates

How can I unzip a file .zip

from Python to some directory output_dir

and get a list of all directories made when unzipping as a result? For example, if I have:

unzip('myzip.zip', 'outdir')

outdir

is a directory that may contain other files / directories. When I unzip myzip.zip

into it, I would like to be able to unzip

return all directories created in outdir/

as a result of zipping. Here is my code:

import zipfile
def unzip(zip_file, outdir):
    """
    Unzip a given 'zip_file' into the output directory 'outdir'.
    """
    zf = zipfile.ZipFile(zip_file, "r")
    zf.extractall(outdir)

      

How can I make it unzip

return the ones created by it in the outdir

dir? thank.

Edit: The solution that matters the most to me is to get ONLY the top-level directories in the zip file and then recursively go through them, which will ensure that I get all the files made by the zip. Is it possible? The systemic behavior of namelist makes it impossible to rely on

+3


source to share


4 answers


You can read the contents of the zip file using the method namelist()

. Directories will have a path separator:

>>> import zipfile
>>> zip = zipfile.ZipFile('test.zip')
>>> zip.namelist()
['dir2/', 'file1']

      

You can do this before or after retrieving the content.

Depending on your operating environment, the result namelist()

may be limited to the top-level paths of the zip archive (e.g. Python on Linux), or it may cover the entire contents of the archive (e.g. IronPython on Windows).



namelist()

returns a complete list of the contents of a zip archive, with directories marked with a path separator. For example, a zip archive with the following file structure:

./file1
./dir2
./dir2/dir21
./dir3
./dir3/file3
./dir3/dir31
./dir3/dir31/file31

      

returns the following list returned zipfile.ZipFile.namelist()

:

[ 'file1', 
  'dir2/', 
  'dir2/dir21/', 
  'dir3/', 
  'dir3/file3', 
  'dir3/dir31/', 
  'dir3/dir31/file31' ]

      

+8


source


ZipFile.namelist

will return a list of the names of the elements in the archive. However, these names will only be fully qualified filenames, including their directory path. (A zip file can only contain files, not directories, so directories are implied by the names of the archive items.) To identify the directories that are created, you need a list of all directories implicitly created by each file.

The function dirs_in_zip()

below will do this and collect all dir names into a set.



import zipfile
import os

def parent_dirs(pathname, subdirs=None):
    """Return a set of all individual directories contained in a pathname

    For example, if 'a/b/c.ext' is the path to the file 'c.ext':
    a/b/c.ext -> set(['a','a/b'])
    """
    if subdirs is None:
        subdirs = set()
    parent = os.path.dirname(pathname)
    if parent:
        subdirs.add(parent)
        parent_dirs(parent, subdirs)
    return subdirs


def dirs_in_zip(zf):
    """Return a list of directories that would be created by the ZipFile zf"""
    alldirs = set()
    for fn in zf.namelist():
        alldirs.update(parent_dirs(fn))
    return alldirs


zf = zipfile.ZipFile(zipfilename, 'r')

print(dirs_in_zip(zf))

      

+1


source


Let him finish and then read the contents of the directory - this is a good example of this.

0


source


Assuming no one else will write the target directory at the same time, traverse the directory recursively before unzipping, then thereafter and compare the results.

0


source