Selective extracting and opening for zipfile in python

3k Views Asked by At

From the docs, it looks like it's possible to perform selective file extract and open using the zipfile module in native python, http://docs.python.org/2/library/zipfile using

ZipFile.extract(member[, path[, pwd]])

Extract a member from the archive to the current working directory; member must be its full name or a ZipInfo object). Its file information is extracted as accurately as possible. path specifies a different directory to extract to. member can be a filename or a ZipInfo object. pwd is the password used for encrypted files.

I have a zipfile as such foobar.zip:

foobar.zip\
  \foo
      \a.txt
      \b.txt
  \bar
      \b.txt
      \c.txt

I've tried to extract files from a single sub-directory of the .zip file but it prints nothing sometimes:

import zipfile
with zipfile.ZipFile('foobar.zip','r') as inzipfile:
  for infile in inzipfile.namelist():
    if 'foo' in os.path.split(infile)[0]:
      print inzipfile.open(infile,'r').read()

I've tried to give a list of selected files that i might want to extract but it also prints nothing sometimes too.

wanted = ['a.txt', 'b.txt']
import zipfile
with zipfile.ZipFile('foobar.zip','r') as inzipfile:
  for infile in inzipfile.namelist():
    if os.path.split(infile)[1] in wanted:
      print inzipfile.open(infile,'r').read()

Edited: There's nothing wrong with the code or how I'm reading the files. I think there's something wrong with my zipfile which causes schroedinbug where sometimes my sub-directory files don't open and inzipfile.open(infile,'r').read() returns None. Now it extracts, opens and print the content of the file.

Any idea how to check within the python code, that all files in a .zip file can be opened with the selective extract/open method above?

How else can I perform selective extract/open of zipfiles? Is there a more pythonic method?

1

There are 1 best solutions below

0
On BEST ANSWER

There is something wrong with your code. It's opening and reading the folder names which are also in inzipfile.namelist(). You can see this by simply:

print inzipfile.namelist()

Which will output:

['foobar/bar/', 'foobar/bar/b.txt', 'foobar/bar/c.txt', 'foobar/foo/', 
 'foobar/foo/a.txt', 'foobar/foo/b.txt', 'foobar/']

Another way to see it is withinzipfile.printdir()which should result in something along the following lines being printed:

File Name                                             Modified             Size
foobar/bar/                                    2014-01-12 08:53:36            0
foobar/bar/b.txt                               2014-01-12 08:54:08           60
foobar/bar/c.txt                               2014-01-12 08:54:28           60
foobar/foo/                                    2014-01-12 08:53:02            0
foobar/foo/a.txt                               2014-01-12 08:55:04           60
foobar/foo/b.txt                               2014-01-12 08:55:24           60
foobar/                                        2014-01-12 08:52:32            0

Notice that in both cases the name of all folder entries end with a/character. You can use that as a simple way to detect them:

import os
import zipfile

with zipfile.ZipFile('foobar.zip', 'r') as inzipfile:
    for infile in (name for name in inzipfile.namelist() if name[-1] != '/'):
        if 'foo' in os.path.split(infile)[0]:
            print inzipfile.open(infile,'r').read(),

Likewise:

wanted = {'a.txt', 'b.txt'}  # use a set, it's faster for testing membership
import zipfile
with zipfile.ZipFile('foobar.zip','r') as inzipfile:
    for infile in (name for name in inzipfile.namelist() if name[-1] != '/'):
        if os.path.split(infile)[1] in wanted:
          print inzipfile.open(infile,'r').read()

The only way I can think of to check if all the [file] members of an archive can be opened, is to actually try doing it to each one:

def check_files(zipfilename):
    """ Check and see if all members of a .zip archive can be opened.
        Beware of vacuous truth - all members of an empty archive can be opened
    """
    def can_open(archive, membername):
        try:
            archive.open(membername, 'r')  # return value ignored
        except (RuntimeError, zipfile.BadZipfile, zipfile.LargeZipFile):
            return False
        return True

    with zipfile.ZipFile(zipfilename, 'r') as archive:
        return all(can_open(archive, membername)
                    for membername in (
                        name for name in archive.namelist() if name[-1] != '/'))