How should I compare the hashes in my baseline.txt file with the hashes of the files in a directory?

77 Views Asked by At

I am designing a file integrity monitor in Python. I was able to create a function that iterates through all files in a given directory, hashes the contents of each one, and stores the hashes in a baseline.txt file next to the file's path. Now I'm trying to figure out how to compare the hashes within baseline.txt to the current hashes of the files. This is what I have right now.

def comp_baseline():
    # Begin (continuously) monitoring files with saved baseline
    filehash = hashlib.sha512() # Create a file hashing object
    base_dict = {}
    if os.path.exists("baseline.txt"): # Load file|hash from baseline.txt and store them in a dictionary
        f = open("baseline.txt", 'r')
        for line in f:
            key, value = line.strip().split(' | ')
            base_dict[key] = value
        print(base_dict) # Dictionary created. Keys = files in given directory. Values = hashes of those files

        # Compare current file: hash to dictionary to check if it's in there
        # if not, it means the file does not exist
        # if the file exists but the hash is different, the file has been compromised

        while True:
            time.sleep(1)
            for filename in os.listdir(directory):
                fn = os.path.join(directory, filename)
                with open(fn, 'rb') as f2:
                    while True:
                        data = f2.read(BUF_SIZE)
                        filehash.update(data)
                        if fn in base_dict[key]:
                            print("File found")        
                    
    else:
        print("Baseline file does not exist in local directory.")

Edit: I forgot to clarify, that I want to transfer the filepaths and hashes into a dictionary so that I may directly compare the hash to dictionary values, which are the hashes. Just wondering how I can compare the hash to the value that belongs to the matching filepath. This is what baseline.txt looks like

/home/kali/Documents/test_files/b.txt | c49b73859752f36533fe7efe8994a697f88b1f9fac06003ca2e7d5d1d97ddb230bebe54fc36610d105625862998126ff5974b40c322d719fd706c1db8d503958
/home/kali/Documents/test_files/e.txt | 2d32d7704de22f9016fc75269ee54a4576864e9983aeb187fe06bf00113e5a01b5c93b8123a1e20bcc1c213afe09ed70e7bf0e4b49df80a046b8f91a7daa0b17
/home/kali/Documents/test_files/a.txt | a07b2986313eff8ebe7d73d4f2432bd9c2a7d7c43867cfcb0712f30868302921c5b09d73cd823c6d15aaa6e66ecd024b11d1f09bdf95178031cfd539621b24b0
/home/kali/Documents/test_files/d.txt | febdef5ebb1eec24417b9341f42f3259f4e5918ebb6993fbd362574a8305b03b46da5046b14e7761ede08cb9c4176e89752d525c2917bf701e139397bc561040
/home/kali/Documents/test_files/c.txt | bc7f9d6af7a95e4ca293bc96e19f8047faa7ceefc516043fc16c29ee673855bda84c84dc57cb0f285301dd21f3fbada2a47548f2fbc03187d95c78eb4612822c

It should compare each hash in the txt file to each file in the directory and if one doesn't match then it should give an alert

I appreciate the help!

1

There are 1 best solutions below

2
Reisen On

My aproach to checking the files would be to itterate over every line in the 'baseline.txt' file. Then split at the " | " read the file on the left and genorate a hash, then compare it to the right value.

# read baseline
with open("baseline.txt", "r") as f:
    # look over every file and recorded hash
    for line in f.readlines():
        fp, hash = line.split(" | ")
        # fp = file path

        with open(fp, 'rb') as tf:
            test_hash = hashlib.sha256(tf.read())
            if test_hash != hash:
                print(f"{fp!r} has a non matching hash")