I am using sortedcontainers
SortedKeyList
to store a list of dictionaries. The dictionaries are sorted within the list by the value of a (nested) key (i.e., "token_str").
The SortedKeyList
appears to be properly stored in self.tokenized_files
(e.g., as shown below), which has been verified via the debugger.
However, when I attempt to retrieve an element (i.e., dictionary) from the list with a given value of the key (i.e., token_str), I get TypeError: string indices must be integers, not 'str'
.
How can I retrieve an element (i.e., dictionary) from a SortedKeyList using the value of the nested key? Preferably, using an efficient search algorithm, such as bisect, rather than iterating through the entire list.
self.sorted_index_by_token_str = None
self.tokenized_files = SortedKeyList([
{'full_path': '/path/to/file/mdb_00033k__filename1.pdf', 'tokenized_filename': {'basic_filename': 'filename1.pdf', 'token': {'token_str': 'mdb_00033k'}}},
{'full_path': '/path/to/file/mdb_0027zz__filename2.pdf', 'tokenized_filename': {'basic_filename': 'filename2.pdf', 'token': {'token_str': 'mdb_0027zz'}}},
])
def generate_index(self) -> SortedKeyList:
"""Creates an index comprising a list of dicts of tokenized files sorted by token_str.
"""
self.sorted_index_by_token_str = SortedKeyList(
self.tokenized_files,
key=lambda x: x["tokenized_filename"]["token"]["token_str"],
)
# Evaluate: x["tokenized_filename"]["token"]["token_str"]
# Returns: 'mdb_0027zz'
return self.sorted_index_by_token_str
def find_tokenized_file(self) -> dict:
"""Finds a tokenized file in the index by performing a binary search on token_str.
"""
test = self.sorted_index_by_token_str.index("mdb_0027zz")
# ERROR:
# test = self.sorted_index_by_token_str.index("mdb_0027zz")
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# key = self._key(value)
# ^^^^^^^^^^^^^^^^
# key=lambda x: x["tokenized_filename"]["token"]["token_str"],
# ~^^^^^^^^^^^^^^^^^^^^^^
# TypeError: string indices must be integers, not 'str'```
The key needs to be the entire nested dictionary (not just a simple key value as I believed). The
bisect_left
method is then used to determine the index of the element, with a subsequent check to confirm whether the element located at the index matches the key-value.OR, using bisect_key_left (per @user2357112):