Swap parts of columns in Pandas dataframe

126 Views Asked by At

I have to manage a dataset that looks like this when I plot it. As seen from the graph, around x=17, the values of the "orange" column are taking the place of the data where the "green" column should take. Respectively, the "blue" data are taking the place of "orange" ones, and the "green" ones the place of "blue" ones. This swap occurs around x=17. Later on the graph (around x=24) the swap is different. My question is how can I get the data in the right place (columns). The swap point is not always constant so I can't just swap parts of the columns iteratively. My thought on that is that I have to check for the difference between two points. When the difference is bigger than some value, this is probably a swap point. Although this is not always the case since most of the graphs have a non-linear behavior. A typical dataset contains many more rows so I am looking for a solution to this as much as parameterized as possible. This is the dataset for the above graph:

col1 = [8724.96757035, 8720.86855769, 8713.03560178, 8711.77188717,
       8723.40967556, 8717.95864342, 8719.46206709, 8716.15746255,
       8715.83456161, 8722.05038594, 8721.822529  , 8714.29076839,
       8721.68118216, 8714.94677413, 8706.33839393, 8719.94888389,
       8715.71175774, 8480.37544428, 9151.63757245, 9138.71268152,
       9127.43234993, 9146.51437639, 9148.00997757, 9130.06677617,
       9151.43128313, 8481.34668127, 8482.40548913, 8481.96440291,
       8481.39530663, 8482.7611363 , 8481.26267875, 8480.71911933,
       8481.02279341]

col2 = [8718.4606092 , 9150.29254687, 9130.86473512, 9140.34929925,
       9142.43843709, 9158.33993226, 9148.70914607, 9164.89441174,
       9145.08470894, 9147.82723909, 9132.61236281, 9200.58503831,
       9129.96054189, 9135.65207477, 9165.43826932, 9145.35463759,
       9134.02400092, 8481.58635709, 8480.90717793, 8479.96295137,
       8483.73891949, 8481.93224816, 8482.40478411, 8481.96627135,
       8481.34086757, 8722.99646005, 8736.61137791, 8724.85719973,
       8721.86321039, 8723.91810368, 8720.82987529, 8720.19864748,
       8720.00514769]

col3 = [9157.20772734, 8481.17028812, 8479.95897581, 8481.66854465,
       8481.12688288, 8481.30670312, 8480.84656953, 8483.54011535,
       8481.81742774, 8479.23373517, 8480.44659188, 8480.90515565,
       8481.35596211, 8479.94614036, 8480.12735803, 8482.70698043,
       8481.50464731, 8725.55716505, 8712.41651697, 8737.46352274,
       8719.20402175, 8710.77791026, 8721.07604204, 8718.88881952,
       8720.0611123 , 9158.13239686, 9158.70309418, 9185.89920375,
       9189.72527817, 9153.04424809, 9152.17774172, 9148.59275477,
       9133.33557359]

df = pd.DataFrame({"A":col1, "B":col2, "C":col3})

Any suggestions on that would be much appreciated. Thanks in advance.

2

There are 2 best solutions below

1
On BEST ANSWER

One simple solution would be to sort the data, i.e. the smallest value is always in column "A" and the largest value is always in column "C".

df2 = pd.DataFrame(df.apply(sorted, axis=1).to_list()).rename(columns={0:'A', 1:'B', 2:'C'})
df2.plot()

The resulting plot would look like this:

enter image description here

0
On

Try

df.plot(y=["A", "B","C"])