While reading multiple files with Python, how can I search for the recurrence of an error string?

375 Views Asked by At

I've just started to play with Python and I'm trying to do some tests on my environment ... the idea is trying to create a simple script to find the recurrence of errors in a given period of time.

Basically I want to count the number of times a server fails on my daily logs, if the failure happens more than a given number of times (let's say 10 times) over a given period of time (let's say 30 days) I should be able to raise an alert on a log, but, I´m not trying to just count the repetition of errors on a 30 day interval... What I would actually want to do is to count the number of times the error happened, recovered and them happened again, this way I would avoid reporting more than once if the problem persists for several days.

For instance, let's say :

file_2016_Oct_01.txt@hostname@YES
file_2016_Oct_02.txt@hostname@YES
file_2016_Oct_03.txt@hostname@NO
file_2016_Oct_04.txt@hostname@NO
file_2016_Oct_05.txt@hostname@YES
file_2016_Oct_06.txt@hostname@NO
file_2016_Oct_07.txt@hostname@NO

Giving the scenario above I want the script to interpret it as 2 failures instead of 4, cause sometimes a server may present the same status for days before recovering, and I want to be able to identify the recurrence of the problem instead of just counting the total of failures.

For the record, this is how I'm going through the files:

# Creates an empty list
history_list = []

# Function to find the files from the last 30 days

def f_findfiles():
    # First define the cut-off day, which means the last number 
    # of days which the scritp will consider for the analysis
    cut_off_day = datetime.datetime.now() - datetime.timedelta(days=30)

    # We'll now loop through all history files from the last 30 days
    for file in glob.iglob("/opt/hc/*.txt"):
        filetime = datetime.datetime.fromtimestamp(os.path.getmtime(file))
        if filetime > cut_off_day:
            history_list.append(file)

# Just included the function below to show how I'm going 
# through the files, this is where I got stuck...

def f_openfiles(arg):
    for file in arg:
        with open(file, "r") as file:
            for line in file:
                clean_line = line.strip().split("@")

# Main function
def main():
    f_findfiles()
    f_openfiles(history_list)

I'm opening the files using 'with' and reading all the lines from all the files in a 'for', but I'm not sure how I can navigate through the data to compare the value related to one file with the older files.

I've tried putting all the data in a dictionary, on a list, or just enumerating and comparing, but I've failed on all these methods :-(

Any tips on what would be the best approach here? Thank you!

1

There are 1 best solutions below

7
On BEST ANSWER

I'd better handle such with shell utilities (i.e uniq), but, as long as you prefer to use python:

With minimal effor, you can handle it creating appropriate dict object with stings (like 'file_2016_Oct_01.txt@hostname@YES') being the keys. Iterating over log, you'd check corresponding key exists in dictionary (with if 'file_2016_Oct_01.txt@hostname@YES' in my_log_dict), then assign or increment dict value appropriately.

A short sample:

data_log = {}

lookup_string = 'foobar'
if lookup_string in data_log:
    data_log[lookup_string] += 1
else:
    data_log[lookup_string] = 1

Alternatively (one-liner, yet it looks ugly in python most of time, I have had edited it to use line breaks to be visible):

data_log[lookup_string] = data_log[lookup_string] + 1 \
    if lookup_string in data_log \
    else 1