Pandas - How to sort values for multiple columns but ensure every value in columns appear in order?

78 Views Asked by At

I have df with multiple columns (categories) with 5-8 numbers (groups) per category. I am trying to make a parallel category plot with plotly (https://plotly.com/python/parallel-categories-diagram/) with a set order for each category. The groups' order in the plot is the order of appearace in the df. All works well after sorting the dataframe except when some combination is missing, which messes the order of the missing group.

Example df:

df.sort_values(by=['A','B','C','D'],ascending=True,inplace=True)
A     B     C     D
0     0     0     0
0     0     0     2
0     0     1     0
0     0     1     1

So for category D, the group 2 will be plotted before group 1. I need a way to ensure all groups apear in order but the rows have to be mantained since it is groups assigned to users. Is there a way to achieve the following result instead?

A     B     C     D
0     0     0     0
0     0     1     1
0     0     0     2
0     0     1     0

Bonus points if a fake row is added when the ordering becomes impossible. For example if we had

df.sort_values(by=['A','B','C','D'],ascending=True,inplace=True)
A     B     C     D
0     0     0     0
0     0     0     2
0     0     1     0
0     0     2     1

there would be no way to have Column D in order without breaking the columns, so in this case I would like to add a fake row

df.sort_values(by=['A','B','C','D'],ascending=True,inplace=True)
A     B     C     D
0     0     0     0
0     0     0     1
0     0     0     2
0     0     1     0
0     0     1     1

Or if anyone knows another way to set up the order in the parallel categories graph so it never moves that would also work.

Thank you.

Edited for clarity

1

There are 1 best solutions below

4
PTQuoc On

Your requirement is not sort, but to change entire data

Simply run sort for each Series and use concat to stack them

Sample data:

df = pd.DataFrame({'c1':[3,1,4,2],
                   'c2':['x','y','x','y'],
                   'val':[8,4,5,9]})

temdf = pd.DataFrame()

for col in ['c1', 'c2', 'val']:
    temcol = df[f'{col}'].copy()
    temcol.sort_values(inplace=True, ignore_index=True)
    temdf = pd.concat([temdf, temcol], axis=1)