How to group lists and evaluate mean square error?

224 Views Asked by Michael At 23 September 2022 at 14:53

I'm writing custom metric function and here's the steps I implemented:

I have a list of floats in preds and list of int 0-1 values in target
I round preds
I need to make groupby on preds
Count mean target values for those groupedby preds
Count MSE between groupedby preds and target

That's how df looks like before groupby

rounded = [np.round(x, 2) for x in preds]

df = pd.DataFrame({'target': target, 'preds': rounded})
        
df = df.groupby('preds')['target'].mean().to_frame().reset_index()
        
mse = mean_squared_error(df['target'], df['preds'])

And that's how after groupby and mean() (as I can't properly display groupby)

Basicaly, I don't know how to groupby on two python lists.

I did groupby on one list like that

gr_list = [list(j) for i, j in groupby(rounded)]

But I have no clue how to groupby second list, based on gr_list groupping

Original Q&A

There are 2 best solutions below

Michael On 24 September 2022 at 09:11

Not the cleanest code, but I managed to do it like that:

from collections import defaultdict

d = defaultdict(list)
for i, item in enumerate(rounded): # rounded is rounded preds
    d[item].append(target[i])

meanDict = {}
for k,v in d.items():
    meanDict[k] = sum(v)/ float(len(v))

preds, target = zip(*avgDict.items())

mse = mean_squared_error(values, keys)

Laurent On 25 September 2022 at 06:47

Here is a reproducible example of a more idiomatic way to do what you are trying to achieve, if I understand correctly:

import random
import pandas as pd

preds = [random.random() for _ in range(1_000)]
target = [random.randint(0, 1) for _ in range(1_000)]

df = pd.DataFrame({"preds": preds, "target": target})

import numpy as np

# Steps 1 to 4 of your post
df = df.round({"preds": 2}).groupby("preds").agg(np.mean).reset_index()

print(df)
# Output
     preds    target
0     0.00  1.000000
1     0.01  0.555556
2     0.02  0.375000
3     0.03  0.375000
4     0.04  0.416667
..     ...       ...
96    0.96  0.666667
97    0.97  0.500000
98    0.98  0.375000
99    0.99  0.461538
100   1.00  0.285714

from sklearn.metrics import mean_squared_error

# Step 5
print(mean_squared_error(df["preds"], df["target"]))  # 0.1084811098077257

How to group lists and evaluate mean square error?

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in SCIKIT-LEARN

Related Questions in GROUP-BY

Related Questions in MEAN-SQUARE-ERROR

Trending Questions

Popular # Hahtags

Popular Questions