I'm currently optimising my code and I have found bottle neck. I have dataframe df with column 'Numbers' with numbers from 1 to 100 (integers). I would like to map those numbers with dictionary. I know that I can use .map() or .replace() function but it seems that both solutions are slow and does not take into account that numbers from 'Numbers' are index of my dictionary (which is series), i.e.: I would like to perform the following:
dict_simple=[]
for i in range(100):
dict_simple.append('a' +str(i))
df['Numbers_with_a']=df['Numbers'].apply(lambda x: dict_simple[x])
Unfortunatelly apply function is also very slow. Is there any other way to do it faster? Dataframe is 50M+ records.
I have tried .map(), replace() and .apply() functions from pandas package, but performance is very poor. I would like to improve calculation time.
pandas.Serieshave an index that can be used to map one value to another natively in pandas without the extra expense of callingapplyfor each row or converting values to pythoninttype. Since the numbers you want to map start from zero and aSeriesindexes from0by default, you canstr_mapis aSeriescreated from your "a0"... strings.str_map.iloc[df.numbers]uses your numbers as indicies, giving you a newSeriesof the mapped values. That series is indexed by your numbers, so you drop that index and assign the result back to the original dataframe.