Python code to filter logfile by a specific time

105 Views Asked by At

I've run into an issue while coding the next code:

from datetime import datetime, timedelta
def find_last_index(file_rec):
    time = datetime.now() - timedelta(hours=2)
    file_content = file_rec
    while True:
        ind = file_content.find(time.strftime("%m-%d"))
        date_obj = datetime.strptime(file_content[ind:13], '%m-%d %H:%M:%S')
        if time.hour > date_obj.hour:
            file_content = file_content[ind+5:]
            ind = file_content.find("12-22", ind)
            return ind
        else:
            file_content = file_content[ind + 1:]


file_name = raw_input("Enter File Path From this file's dir: ")
read_file = open(file_name, 'r')
content = read_file.read()
read_file.close()
lastindex = find_last_index(content)
print content[:lastindex]
content = input()
write_file = open("ResultFile.txt", "w")
write_file.write(content[:lastindex])
write_file.close()

The code is supposed to take a log-file looks like that:

12-22 20:14:15.972 26560 27796 D Robocol : no packet received: NullPointerException
12-22 20:14:15.972 26560 27796 D Robocol : no packet received: NullPointerException
12-22 20:14:15.972 26560 27796 D Robocol : no packet received: NullPointerException
12-22 20:14:15.972 26560 27796 D Robocol : no packet received: NullPointerException
12-22 20:14:15.973 26560 27796 D Robocol : no packet received: NullPointerException
12-22 20:14:15.973 26560 27796 D Robocol : no packet received: NullPointerException
12-22 20:14:15.973 26560 27796 D Robocol : no packet received: NullPointerException
12-22 20:14:15.973 26560 27796 D Robocol : no packet received: NullPointerException
12-22 20:14:15.973 26560 27796 D Robocol : no packet received: NullPointerException
12-22 20:14:15.974 26560 27796 D Robocol : no packet received: NullPointerException
12-22 20:14:15.974 26560 27796 D Robocol : no packet received: NullPointerException
12-22 20:14:15.974 26560 27796 D Robocol : no packet received: NullPointerException
12-22 20:14:15.9

Each line starts with the date and time. I would like to insert into a new file only the statements from 2 hours ago until the current time. It would be awesome if someone would help me to solve it.

2

There are 2 best solutions below

0
On

I can't resist suggesting the use of the arrow module for manipulating dates. In many cases it makes life easier. Heres what I offer.

>>> import arrow
>>> refTime = arrow.now().shift(hours=-2).strftime('%m-%d %H:%M:%S')
>>> refTime
'12-27 13:13:57'
>>> str(refTime)
'12-27 13:13:57'
>>> refTime_as_str = str(refTime)
>>> with open('logfile.txt') as log:
...     for line in log:
...         if line[:len(refTime_as_str )] >= refTime_as_str:
...             print (line.strip())
...             
12-27 15:45:50.972 26560 27796 D Robocol : no packet received: NullPointerException
12-27 15:45:50.972 26560 27796 D Robocol : no packet received: NullPointerException
12-27 15:45:50.972 26560 27796 D Robocol : no packet received: NullPointerException
12-27 15:45:50.972 26560 27796 D Robocol : no packet received: NullPointerException
12-27 15:45:50.973 26560 27796 D Robocol : no packet received: NullPointerException
12-27 15:45:50.973 26560 27796 D Robocol : no packet received: NullPointerException
12-27 15:45:50.973 26560 27796 D Robocol : no packet received: NullPointerException
12-27 15:45:50.973 26560 27796 D Robocol : no packet received: NullPointerException
12-27 15:45:50.973 26560 27796 D Robocol : no packet received: NullPointerException
12-27 15:45:50.974 26560 27796 D Robocol : no packet received: NullPointerException
12-27 15:45:50.974 26560 27796 D Robocol : no packet received: NullPointerException
12-27 15:45:50.974 26560 27796 D Robocol : no packet received: NullPointerException
0
On

I believe I found it by reverse-engineering your code logic.

Your function find_last_index continually updates its local copy of the input file. Eventually, it returns the index of the first entry that's at least two (truncated) hours old -- but it's the index in that local copy. Up until you find that time, you're chopping off 5-character dates and then individual characters, so you eventually get an index that's less than 20.

Back in the main program, you still have the original data, kept in variable content. You apply the ind index returned from your routine, an index that no longer applies to content.

If you want to retain your current logic flow, then either

  • return file_content to the main program, or
  • instead of deleting the front of file_content on each failed iteration, update ind and start your search from that point.

Possible code change:

ind = file_content[ind:].find(time.strftime("%m-%d"))
date_obj = datetime.strptime(file_content[ind:ind+13], '%m-%d %H:%M:%S')