How to perform a case-insensitive search for files of a given suffix?

6.5k Views Asked by At

I'm looking for the equivalent of find $DIR -iname '*.mp3', and I don't want to do the kooky ['mp3', 'Mp3', MP3', etc] thing. But I can't figure out how to combine the re*.IGNORECASE stuff with the simple endswith() approach. My goal is to not miss a single file, and I'd like to eventually expand this to other media/file types/suffixes.

import os
import re
suffix = ".mp3"

mp3_count = 0

for root, dirs, files in os.walk("/Volumes/audio"):
    for file in files:
        # if file.endswith(suffix):
        if re.findall('mp3', suffix, flags=re.IGNORECASE):
            mp3_count += 1

print(mp3_count)

TIA for any feedback

3

There are 3 best solutions below

1
On BEST ANSWER

You can try this :)

import os
# import re
suffix = "mp3"

mp3_count = 0

for root, dirs, files in os.walk("/Volumes/audio"):
    for file in files:
        # if file.endswith(suffix):
        if file.split('.')[-1].lower() == suffix:
            mp3_count += 1

print(mp3_count)

Python's string.split() will separate the string into a list, depending on what parameter is given, and you can access the suffix by [-1], the last element in the list

0
On

The regex equivalent of .endswith is the $ sign.

To use your example above, you could do this;

re.findall('mp3$', suffix, flags=re.IGNORECASE):

Though it might be more accurate to do this;

re.findall(r'\.mp3$', suffix, flags=re.IGNORECASE):

which makes sure that the filename ends with .mp3 rather than picking up files such as test.amp3.

This is a pretty good example of a situation that doesn't really require regex - so while you're welcome to learn from these examples, it's worth considering the alternatives provided by other answerers.

3
On

Don't bother with os.walk. Learn to use the easier, awesome pathlib.Path instead. Like so:

from pathlib import Path

suffix = ".mp3"

mp3_count = 0

p = Path('Volumes')/'audio': # note the easy path creation syntax
# OR even:
p = Path()/'Volumes'/'audio': 

for subp in p.rglob('*'): #  recursively iterate all items matching the glob pattern
    # .suffix property refers to .ext extension
    ext = subp.suffix
    # use the .lower() method to get lowercase version of extension
    if ext.lower() == suffix: 
        mp3_count += 1

print(mp3_count)

"One-liner", if you're into that sort of thing (multiple lines for clarity):

sum(1 for subp in (Path('Volumes')/'audio').rglob('*')
     if subp.suffix.lower() == suffix)