Updating os.walk file lists in real-time [python]

1.6k Views Asked by At

I have a function which I want to enumerate through all files and folders from a target folder. When/if it finds rar files I want it to extract them and then delete them. In the case of multi-part archives it will also check for and delete the remaining files (which have already been extracted with the first volume).

I was using os.listdir in a for loop, but the problem with this approach is: a) I don't think it will handle subfolders without writing a recursion loop for them (which I don't want to do because recursion hurts my head). b) because the for loop creates its dictionary(?) of items only at the beginning, when it loops to a file name that has already been removed in a prior iteration I will get a failure to find the file.

It appears os.walk may be better for "a)" above, and my research so far shows that I should be able to update the os.walk in realtime on each iteration. However I can't figure out how to do this.

I've got something like this:

for root, dirs, files in os.walk('d:\\test'):
    for file in files:
        print 'files (before remove): ', file, files
        # This is where I would do some operation that deletes one or more files.
        files.remove(file)
        print 'files (after remove): ', file, files

However the output is like this:

D:\test>d:\Python27\python.exe d:\file.py
files (before remove):  Crystal.part01.rar ['Crystal.part01.rar', 'Crystal.part02.rar', 'Crystal.part03.rar', 'Crystal.part04.rar', 'Crystal.part05.rar', 'Crystal.part06.rar']
files (after remove):  Crystal.part01.rar ['Crystal.part02.rar', 'Crystal.part03.rar', 'Crystal.part04.rar', 'Crystal.part05.rar', 'Crystal.part06.rar']
files (before remove):  Crystal.part03.rar ['Crystal.part02.rar', 'Crystal.part03.rar', 'Crystal.part04.rar', 'Crystal.part05.rar', 'Crystal.part06.rar']
files (after remove):  Crystal.part03.rar ['Crystal.part02.rar', 'Crystal.part04.rar', 'Crystal.part05.rar', 'Crystal.part06.rar']
files (before remove):  Crystal.part05.rar ['Crystal.part02.rar', 'Crystal.part04.rar', 'Crystal.part05.rar', 'Crystal.part06.rar']
files (after remove):  Crystal.part05.rar ['Crystal.part02.rar', 'Crystal.part04.rar', 'Crystal.part06.rar']

I think this makes sense...we can see the list getting updated, however because I am already stuck in the (second) For statement that has created a list of the files it continues to try to loop through the original list order which is now offset by one, creating a "skip" effect.

How can I achieve operating on each file in the directory, except letting the calling loop know to skip an item that has been removed?

Update - I may be incorrect in assuming this can be done. What gave me this idea was this snipped from the python docs:

When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to inform walk() about directories the caller creates or renames before it resumes walk() again. Modifying dirnames when topdown is False has no effect on the behavior of the walk, because in bottom-up mode the directories in dirnames are generated before dirpath itself is generated.

On reading it again I see it only mentions dirnames and not filename - so while I still don't understand the exact method to accomplish this, it looks like you may only be able to manipulate the dirnames in place.

1

There are 1 best solutions below

1
wwii On
for root, dirs, files in os.walk('d:\\test'):
    for file in files:
        #process stuff

files is a list that you are iterating over - you should not modify it, as you have discovered. When you process stuff if you delete a file that hasn't been reached in the for loop iteration then you can do three things (that I can think of)

  1. Check to see if the file is there before you process it

    if fname not in os.listdir(os.getcwd()):
        continue
    
  2. Use a try/except to catch the IOError. If you want to limit the exception handling further, you can query the error text for "No such file or directory: 'yourfilehere'" in the except suite and re-raise the exception if it is something different.

    fname = 'foo.bar'
    try:
        with open(fname) as f:
            pass
    except IOError as e:
        #print(e, str(e), repr(e))
        if 'No such file' in str(e):
            pass
    else:
        raise
    
  3. I guess you could keep a separate list/set that contains all the files that your process has deleted and check if the file is in it before trying to process it.


If you really needed to, you can write a class with the behavior you need.

#Python 2.7 code
import collections
class F(collections.deque):
    def __iter__(self):
        return self
    def next(self):
        try:
            return self.pop()
        except IndexError:
            raise StopIteration

a = [1,2,3,4]
f = F(a)
for n in f:
    print n
    if n == 3:
        f.remove(2)

Result

>>> 
4
3
1
>>>