I have a dataframe df
with transactions where the values in the column Col
can be repeated. I use Counter dictionary1
to count the frequency for each Col
value, then I would like to run a for loop on a subset of the data and obtain a value pit
. I want to create a new dictionary dict1
where the key is the key from dictionary1
and the value is the value of pit
. This is the code I have so far:
dictionary1 = Counter(df['Col'])
dict1 = defaultdict(int)
for i in range(len(dictionary1)):
temp = df[df['Col'] == dictionary1.keys()[i]]
b = temp['IsBuy'].sum()
n = temp['IsBuy'].count()
pit = b/n
dict1[dictionary1.keys()[i]] = pit
My question is, how can i assign the key and value for dict1
based on the key of dictionary1
and the value obtained from the calculation of pit
. In other words, what is the correct way to write the last line of code in the above script.
Thank you.
Since you're using
pandas
, I should point out that the problem you're facing is common enough that there's a built-in way to do it. We call collecting "similar" data into groups and then performing operations on them agroupby
operation. It's probably wortwhile reading the tutorial section on the groupbysplit-apply-combine
idiom -- there are lots of neat things you can do!The pandorable way to compute the
pit
values would be something likeFor example:
which you could turn into a dictionary from a Series if you insisted: