I am designing a file integrity monitor in Python. I was able to create a function that iterates through all files in a given directory, hashes the contents of each one, and stores the hashes in a baseline.txt file next to the file's path. Now I'm trying to figure out how to compare the hashes within baseline.txt to the current hashes of the files. This is what I have right now.
def comp_baseline():
# Begin (continuously) monitoring files with saved baseline
filehash = hashlib.sha512() # Create a file hashing object
base_dict = {}
if os.path.exists("baseline.txt"): # Load file|hash from baseline.txt and store them in a dictionary
f = open("baseline.txt", 'r')
for line in f:
key, value = line.strip().split(' | ')
base_dict[key] = value
print(base_dict) # Dictionary created. Keys = files in given directory. Values = hashes of those files
# Compare current file: hash to dictionary to check if it's in there
# if not, it means the file does not exist
# if the file exists but the hash is different, the file has been compromised
while True:
time.sleep(1)
for filename in os.listdir(directory):
fn = os.path.join(directory, filename)
with open(fn, 'rb') as f2:
while True:
data = f2.read(BUF_SIZE)
filehash.update(data)
if fn in base_dict[key]:
print("File found")
else:
print("Baseline file does not exist in local directory.")
Edit: I forgot to clarify, that I want to transfer the filepaths and hashes into a dictionary so that I may directly compare the hash to dictionary values, which are the hashes. Just wondering how I can compare the hash to the value that belongs to the matching filepath. This is what baseline.txt looks like
/home/kali/Documents/test_files/b.txt | c49b73859752f36533fe7efe8994a697f88b1f9fac06003ca2e7d5d1d97ddb230bebe54fc36610d105625862998126ff5974b40c322d719fd706c1db8d503958
/home/kali/Documents/test_files/e.txt | 2d32d7704de22f9016fc75269ee54a4576864e9983aeb187fe06bf00113e5a01b5c93b8123a1e20bcc1c213afe09ed70e7bf0e4b49df80a046b8f91a7daa0b17
/home/kali/Documents/test_files/a.txt | a07b2986313eff8ebe7d73d4f2432bd9c2a7d7c43867cfcb0712f30868302921c5b09d73cd823c6d15aaa6e66ecd024b11d1f09bdf95178031cfd539621b24b0
/home/kali/Documents/test_files/d.txt | febdef5ebb1eec24417b9341f42f3259f4e5918ebb6993fbd362574a8305b03b46da5046b14e7761ede08cb9c4176e89752d525c2917bf701e139397bc561040
/home/kali/Documents/test_files/c.txt | bc7f9d6af7a95e4ca293bc96e19f8047faa7ceefc516043fc16c29ee673855bda84c84dc57cb0f285301dd21f3fbada2a47548f2fbc03187d95c78eb4612822c
It should compare each hash in the txt file to each file in the directory and if one doesn't match then it should give an alert
I appreciate the help!
My aproach to checking the files would be to itterate over every line in the 'baseline.txt' file. Then split at the " | " read the file on the left and genorate a hash, then compare it to the right value.