got error in a pandas function while using this function groupby().mean

42 Views Asked by At
import pandas as pd
# Assuming df is my DataFrame
data = {
    'Make': ['Toyota', 'Honda', 'Toyota', 'BMW', 'Nissan', 'Toyota', 'Honda', 'Honda', 'Toyota', 'Nissan'],
    'Colour': ['White', 'Red', 'Blue', 'Black', 'White', 'Green', 'Blue', 'Blue', 'White', 'White'],
    'Odometer (KM)': [150043, 87899, 32549, 11179, 213095, 99213, 45698, 54738, 60000, 31600],
    'Doors': [4, 4, 3, 5, 4, 4, 4, 4, 4, 4],
    'Price': ['$4,000.00', '$5,000.00', '$7,000.00', '$22,000.00', '$3,500.00', '$4,500.00', '$7,500.00', '$7,000.00', '$6,250.00', '$9,700.00']
}

df = pd.DataFrame(data)

# Group by "Make" and calculate the mean for each numeric column
result = df.groupby("Make").mean()

# Display the result
print(result)   

this is my code and there was no mistake in this code as it follows pandas documentation and follow all the rules but it stills gives error and if i give any 'char' in mean then its working fine and gives right answer like below code "everything tested in jupyter notebook"

import pandas as pd

# Assuming df is my DataFrame
data = {
    'Make': ['Toyota', 'Honda', 'Toyota', 'BMW', 'Nissan', 'Toyota', 'Honda', 'Honda', 'Toyota', 'Nissan'],
    'Colour': ['White', 'Red', 'Blue', 'Black', 'White', 'Green', 'Blue', 'Blue', 'White', 'White'],
    'Odometer (KM)': [150043, 87899, 32549, 11179, 213095, 99213, 45698, 54738, 60000, 31600],
    'Doors': [4, 4, 3, 5, 4, 4, 4, 4, 4, 4],
    'Price': ['$4,000.00', '$5,000.00', '$7,000.00', '$22,000.00', '$3,500.00', '$4,500.00', '$7,500.00', '$7,000.00', '$6,250.00', '$9,700.00']
}

df = pd.DataFrame(data)

# Group by "Make" and calculate the mean for each numeric column
result = df.groupby("Make").mean('dhshsdhs')

# Display the result
print(result)

how this code is working fine ?

i tried the right way of coding as mentioned in pandas documentation and gets error and when doing the code wrong way its giving me right result.

1

There are 1 best solutions below

0
mozway On

groupby.mean accepts a numeric_only parameter which is False by default.

This is what triggers your error when running df.groupby('Make').mean() since your have strings in some columns (e.g. 'White' in Colour or '$4,000.00' in Price), not numbers.

By passing a random string as parameter you just set numeric_only=True since the boolean value of 'dhshsdhs' is True.

As a side note, if you want to compute the mean of the Price, first convert to numeric:

(df.assign(Price=lambda d: pd.to_numeric(d['Price'].str.replace(r'[$,]', '', regex=True)))
   .groupby('Make').mean(numeric_only=True)
)

Output:

        Odometer (KM)  Doors    Price
Make                                 
BMW      11179.000000   5.00  22000.0
Honda    62778.333333   4.00   6500.0
Nissan  122347.500000   4.00   6600.0
Toyota   85451.250000   3.75   5437.5