Depending on whether I declare a string variable within a function, or pass it from outside the function, changes the behavior of the function. Specifically, declaring a string variable and converting it to a dataframe through a read_csv(StringIO()) within the function results in a KeyError.
import pandas as pd
from io import StringIO
def test(csv, search):
df = pd.read_csv(StringIO(csv))
result = df.loc[df['code'].isin(search), ['code']]
return result
search_list = ['CD456', 'EF789']
csv_str = """\
code,name
AB123,David
CD456,Larry
EF789,Jones
"""
print(test(csv_str, search_list))
The above works as one would expect, returning:
code
1 CD456
2 EF789
However, simply moving the csv string variable declaration inside the function like so:
import pandas as pd
from io import StringIO
def test2(search):
csv_str2 = """\
code,name
AB123,David
CD456,Larry
EF789,Jones
"""
df2 = pd.read_csv(StringIO(csv_str2))
result2 = df2.loc[df2['code'].isin(search), ['code']]
return result2
search_list2 = ['CD456', 'EF789']
print(test2(search_list2))
produces:
Traceback (most recent call last):
File "\pandas\core\indexes\base.py", line 3803, in get_loc
return self._engine.get_loc(casted_key)
File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'code'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "\lib\site-packages\IPython\core\interactiveshell.py", line 3505, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-3-b12167844498>", line 17, in <module>
print(test2(search_list2))
File "<ipython-input-3-b12167844498>", line 12, in test2
result2 = df2.loc[df2['code'].isin(search), ['code']]
File "\lib\site-packages\pandas\core\frame.py", line 3804, in __getitem__
indexer = self.columns.get_loc(key)
File "\lib\site-packages\pandas\core\indexes\base.py", line 3805, in get_loc
raise KeyError(key) from err
KeyError: 'code'