- I have a jsonl file containing around 1,000,000 dictionaries
- I am interested in discionaries where the values of field_1 is a string from list_of_strings which contains around 100,000 strings.
I can hold both in memory at the same time, and i'd like to quickly and efficiently compare them.
my first attempt was
matching_dicts = []
key = "field_1 "
# Open the JSONL file and iterate over its lines
with jsonlines.open(file_path) as reader:
for line_number, obj in enumerate(reader):
# Check if the object has the target field and its value is in the list_of_strings
if key in obj and obj[key] in list_of_strings :
# If so, append the line to the list
matching_articles.append((obj, line_number))
this is slow what would be faster?
Preprocess and load the list of strings into a set for faster membership checks: