I have df with multiple columns (categories) with 5-8 numbers (groups) per category. I am trying to make a parallel category plot with plotly (https://plotly.com/python/parallel-categories-diagram/) with a set order for each category. The groups' order in the plot is the order of appearace in the df. All works well after sorting the dataframe except when some combination is missing, which messes the order of the missing group.
Example df:
df.sort_values(by=['A','B','C','D'],ascending=True,inplace=True)
A B C D
0 0 0 0
0 0 0 2
0 0 1 0
0 0 1 1
So for category D, the group 2 will be plotted before group 1. I need a way to ensure all groups apear in order but the rows have to be mantained since it is groups assigned to users. Is there a way to achieve the following result instead?
A B C D
0 0 0 0
0 0 1 1
0 0 0 2
0 0 1 0
Bonus points if a fake row is added when the ordering becomes impossible. For example if we had
df.sort_values(by=['A','B','C','D'],ascending=True,inplace=True)
A B C D
0 0 0 0
0 0 0 2
0 0 1 0
0 0 2 1
there would be no way to have Column D in order without breaking the columns, so in this case I would like to add a fake row
df.sort_values(by=['A','B','C','D'],ascending=True,inplace=True)
A B C D
0 0 0 0
0 0 0 1
0 0 0 2
0 0 1 0
0 0 1 1
Or if anyone knows another way to set up the order in the parallel categories graph so it never moves that would also work.
Thank you.
Edited for clarity
Your requirement is
notsort, but to change entire dataSimply run
sortfor each Series and useconcatto stack themSample data: