Pandas - How to sort values for multiple columns but ensure every value in columns appear in order?

78 Views Asked by Laia Ferrer-Argemí At 30 May 2023 at 16:25

I have df with multiple columns (categories) with 5-8 numbers (groups) per category. I am trying to make a parallel category plot with plotly (https://plotly.com/python/parallel-categories-diagram/) with a set order for each category. The groups' order in the plot is the order of appearace in the df. All works well after sorting the dataframe except when some combination is missing, which messes the order of the missing group.

Example df:

df.sort_values(by=['A','B','C','D'],ascending=True,inplace=True)
A     B     C     D
0     0     0     0
0     0     0     2
0     0     1     0
0     0     1     1

So for category D, the group 2 will be plotted before group 1. I need a way to ensure all groups apear in order but the rows have to be mantained since it is groups assigned to users. Is there a way to achieve the following result instead?

A     B     C     D
0     0     0     0
0     0     1     1
0     0     0     2
0     0     1     0

Bonus points if a fake row is added when the ordering becomes impossible. For example if we had

df.sort_values(by=['A','B','C','D'],ascending=True,inplace=True)
A     B     C     D
0     0     0     0
0     0     0     2
0     0     1     0
0     0     2     1

there would be no way to have Column D in order without breaking the columns, so in this case I would like to add a fake row

df.sort_values(by=['A','B','C','D'],ascending=True,inplace=True)
A     B     C     D
0     0     0     0
0     0     0     1
0     0     0     2
0     0     1     0
0     0     1     1

Or if anyone knows another way to set up the order in the parallel categories graph so it never moves that would also work.

Thank you.

Edited for clarity

Original Q&A

There are 1 best solutions below

PTQuoc On 30 May 2023 at 17:05

Your requirement is not sort, but to change entire data

Simply run sort for each Series and use concat to stack them

Sample data:

df = pd.DataFrame({'c1':[3,1,4,2],
                   'c2':['x','y','x','y'],
                   'val':[8,4,5,9]})

temdf = pd.DataFrame()

for col in ['c1', 'c2', 'val']:
    temcol = df[f'{col}'].copy()
    temcol.sort_values(inplace=True, ignore_index=True)
    temdf = pd.concat([temdf, temcol], axis=1)

Pandas - How to sort values for multiple columns but ensure every value in columns appear in order?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in PLOTLY

Related Questions in PARALLEL-COORDINATES

Trending Questions

Popular # Hahtags

Popular Questions