Python code readability assistance

106 Views Asked by At

I would like to improve my code's readability and formatting. I have this code, that works, but I feel like it could be tighter than this, I just can't seem to get it to work any other way. The idea is to read a .txt file find incoming e-mail strings and organize the data by frequency of hour sent.

Here is an example line that I'm looking for in the text:

From [email protected] Sat Jan 5 09:14:16 2008

Here is my code as it is today.

fname = input("Enter file:")
if len(fname) <1 : fname = "mbox-short.txt"
fh = open(fname)
time = list()
hours = list()
hr = dict()

for line in fh:
        if not line.startswith("From "): continue
        words = line.split()
        time.append(words[5])

for i in time:
        i.split(":")
        hours.append(i[:2])

for h in hours:
        hr[h]=hr.get(h, 0)+1

l = list()
for k,v in hr.items():
        l.append((k,v))
l.sort()
for k, v in l:
        print (k,v)
3

There are 3 best solutions below

1
A.H On

Just some hint : (Don't try this at home, it's really bad code :D, but show some python structs to learn) (operator, defaultdict and list comprehension)

from collections import defaultdict
import operator

hr = defaultdict(int)

with open(fname) as fh:
    hours = [data.split()[5].split(":")[:2] for data in fh if data.startswith("From ")]

for h in hours:
    hr[h]+=1

sorted_hr = sorted(hr.items(),key=operator.itemgetter(1))
for k, v in sorted_hr:
        print (k,v)
1
Pēteris Caune On

Here's (what I think is) functionally equivalent code:

from collections import Counter

fname = input("Enter file: ")
if fname == "":
    fname = "mbox-short.txt"

hour_counts = Counter()
with open(fname) as f:
    for line in f:
        if not line.startswith("From "):
            continue
        words = line.split()
        time = words[5]
        hour = time[:2]
        hour_counts[hour] += 1

for hour, count in sorted(hour_counts.items()):
    print(hour, count)

You might also want to parse mbox format with an existing Python library, instead of doing it yourself.

0
repzero On

A regex approach will be something like this

import re
hours=[]
with open("new_file") as textfile:
    for line in textfile:
        if re.search("^From [A-Za-z0-9]+[@][a-zA-Z]+[.][a-z]{3}",line):
            hours.append(re.sub(".*([0-9]{2})[:][0-9]{2}[:][0-9]{2} [0-9]{4}.*","\\1",line.strip()))

hours.sort()               
print(hours)

Example if the data below is in a file new_file

kwejrhkhwr
From [email protected] Sat Jan 5 09:14:16 2008
From [email protected] Sat Dec 31 01:40:16 2015
Something not needed here
Something not needed here
From [email protected] Sat Oct 25 44:03:10 2015

Output of hours in ascending order

['01', '09', '44']