I was having trouble with Python script opening a file which contained an umlaut character. Naturally I thought I could correct this with a unicode utf8 fix, but not so...
I ended up using the mbcs ( default is cp1252)
Then I wrote this statement of which I wish to write MUCH cleaner,
def len(fname):
i = -1
try:
with open(fname, encoding='mbcs') as f:
for i, l in enumerate(f):
pass
except UnicodeDecodeError:
try:
i = -1
with open(fname, encoding='utf8') as f:
for i, l in enumerate(f):
pass
except UnicodeDecodeError:
i = -1
with open(fname) as f:
for i, l in enumerate(f):
pass
return i + 2 # 2 because it starts at -1 not 0
You're almost certainly going about this all wrong, as explained in the comments… but if you really do need to do something like this, here's how to simplify it:
The general solution to avoid repeating yourself is to use a loop. You've got the same code three times, with the only difference being the
encoding
, so loop over three encodings instead. (In your case, the third loop didn't pass anencoding
at all, so you do have to know the default value of the parameter, but the docs orhelp
will tell you that.) The only wrinkle is that you apparently don't want to handle exceptions in the third case; the easiest way to do that is to reraise the last exception if they all fail.While we're at it: There's no need to "declare"
i
up-front the way you do; thefor
loop is just going to start at 0 and erase whatever you put there. That also means the+2
at the end is wrong. But there's an easier way to get the length of an iterable in the first place: just feed it into something that consumes generator expressions. A customilen
function written in C would be ideal, but people have tested various different Python implementations, andsum(1 for _ in iterable)
is almost as fast as the perfect solution, and dead simple, so it's the most common idiom. If this isn't obvious you to, factor it out as a function and call itlien
, and give it a nice docstring and/or comment. Or justpip install more-itertools
and then you can just callmore_itertools.ilen(f)
.Anyway, putting it all together: