So using a script, with Python, in my CSV file, I want to see if IP and timestamp values exists in some way in the line entries of the JSON log file, and if so, return that specific JSON log entry to another file. I tried to make it universal so it's applicable to all IP addresses. Here's what the sample CSV file would look like;
"clientip",""destip","dest_hostname","timestamp"
"127.0.0.1","0.0.0.0","randomhost","2023-09-09T04:18:22.542Z"
A sample line entry from the Json Log File
{"log": "09-Sept-2023 rate-limit: info: client @xyz 127.0.0.1, "stream":"stderr", "time": 2023-09-09T04:18:22.542Z"}
It's the lines from the JSON log file we want to return in the output.txt file when there's a match. The JSON file doesn't have the same fields and organization like the CSV does (with clientip, destip, dest_hostname, timestamp, but I was hoping that I could still at least return lines from the JSON log files to a new file that had matches on the clientip (like we see here with 127.0.0.1 in "info: client @xyz 127.0.0.1) and maybe the timestamp.
I tried shell previously but could not get any matches. I tried the join command join file.csv xyz-json.log > output.txt
but it didn't yield anything, neither did awk
with specification like "NR==FR".
That's why I'm trying to get this done in Python now. I'm new to Python as well, but this is what I roughly had in mind, ignoring indentation for now.
import csv
for line in csv
for line in json-logs
if csv == json-logs
print l1 == l2
I would appreciate any help/assistance with this!
One possibility would be to read both csv and json files into a dataframe; extract any
ip
values from the jsonlog
then do an inner merge from the json file onip
andtime
and output rows remaining after the merge: