Grouping categories into smaller categories in a column

133 Views Asked by At

I want to group a column in python in such a way that all the colors are grouped with color, cars grouped with car, and all the other values grouped as others. The number of other values is very large so manually inputting them is difficult. However, the values of cars and colors can be manually replaced.

For example, the column looks like this

Column name
Red
Blue
Green
BMW
Toyata
djdjd
dhfh
sher
dhfg

The number of other values is very large so manually inputting them is difficult. However, the values of cars and colors can be manually replaced.

The result should be

Column name
Colour
Colour
Colour
Car
Car
Other 
Other
Other
Other
2

There are 2 best solutions below

2
On

I am guessing, you mean Pandas df column, when you mention column.

You can make a dictionary, as follows

replace_dict = {'Red' : 'Color',
                'Blue': 'Color',
                 .....
                 .....
                'Toyata': 'Car',
                 .....
                 }

And then apply replace function to the column.

Like

df['Column name'].replace(replace_dict)
0
On

I'm not aware of what type of object your columns are but if you transform your columns into list the following messy script will work:

ColoursList = ['Blue', 'Green', 'Yellow']
CarsList = ['BMW','Mercedes','Opel']
ValuesList = ['Blue', 'jfefe', 'bndf', 'Green', 'Mercedes', 'Yellow']
GroupedList = []

for ValueIndex in range(0,len(ValuesList)):
    GroupedValue = ''
    #Colour check
    for ColourIndex in range(0,len(ColoursList)):
        if ValuesList[ValueIndex] == ColoursList[ColourIndex]:
            GroupedValue = 'Colour'
    #Car check
    for CarIndex in range(0,len(ColoursList)):
        if ValuesList[ValueIndex] == CarsList[CarIndex]:
            GroupedValue = 'Car'
    print(GroupedValue)
    
    if GroupedValue == '':
        GroupedValue = 'Other'
    
    GroupedList.append(GroupedValue)

print(ValuesList)
print(GroupedList)