How to convert R's Tukey's HSD table into correlation matrix in Python using Pandas

Question

How to convert R's Tukey's HSD table into correlation matrix in Python using Pandas

31 Views Asked by Tigas At 28 March 2024 at 13:50

I have recently exported a table from R's TukeyHSD test to obtain the p-values for various time groups (0, 5, 10, 20, 30, 40, 50, 60). I'm curious if there's a method to transform this into a correlation matrix, where each axis represents the time groups and corresponds to the respective p-value.

The table includes an index indicating the correspondence between the different time groups (e.g., 5-10 or 10-50). I've imported it as a dataframe into Python. Is there a way to rearrange the dataframe as depicted below?

        p adj
Groups  
50-0    2.815526e-13
60-0    2.855494e-13
20-0    4.764197e-08
50-5    1.712389e-05
50-10   1.483440e-04
50-40   1.643480e-04
60-5    5.873007e-04
60-10   5.218047e-03
60-40   5.613566e-03
10-0    6.878476e-03
40-0    1.270855e-02
20-5    7.380859e-02
50-20   1.574372e-01
40-20   3.264569e-01
20-10   3.369147e-01
5-0 3.816166e-01
60-50   7.301423e-01
60-20   8.503578e-01
10-5    9.731384e-01
40-5    9.820983e-01
40-10   1.000000e+00

I want it to be something like:

        0    5    10    20    ...
0       ...  ...  ...  ...
5       ...  ...  ...  ...
10      ...  ...  ...  ...
20      ...  ...  ...  ...
...

I haven't found anything similar online, so I don't know where to start.

Original Q&A

There are 2 best solutions below

Davinder Singh On 28 March 2024 at 14:30

Just create a dataframe with

index unique values from 0th split of index of original dataframe(df)
columns be unique values from 1th split index of of original dataframe(df)

final = pd.DataFrame(
    {},
    index = df.index.map(lambda x: x.split('-')[0]).unique().map(lambda x: int(x)).sort_values(),
    columns = df.index.map(lambda x: x.split('-')[1]).unique().map(lambda x: int(x)).sort_values()
)

for i in final.index:
    for j in final.columns:
        try:
            final.loc[i, j] = df.loc[f'{i}-{j}', 'p adj']
        except:
            final.loc[i, j] = np.nan

Output:

Groups        0         5         10        20        40        50
Groups                                                            
5       0.381617       NaN       NaN       NaN       NaN       NaN
10      0.006878  0.973138       NaN       NaN       NaN       NaN
20           0.0  0.073809  0.336915       NaN       NaN       NaN
40      0.012709  0.982098       1.0  0.326457       NaN       NaN
50      0.281553  0.000017  0.000148  0.157437  0.000164       NaN
60           0.0  0.000587  0.005218  0.850358  0.005614  0.730142

Then iterate over (ith, jth) location of final dataframe and fill those value using on-the-fly created index form original dataframe.

There must have some optimal solution that fully use vectorization of pandas where no need to relay on loop.

**Andrej Kesely** · Accepted Answer · 2024-03-28T20:22:00.047000

Try:

df[["x", "y"]] = df.index.str.split("-", expand=True).to_frame().astype(int).values
print(pd.crosstab(df["x"], df["y"], df["p adj"], aggfunc="first"))

Prints:

y             0         5         10        20        40        50
x                                                                 
5   3.816166e-01       NaN       NaN       NaN       NaN       NaN
10  6.878476e-03  0.973138       NaN       NaN       NaN       NaN
20  4.764197e-08  0.073809  0.336915       NaN       NaN       NaN
40  1.270855e-02  0.982098  1.000000  0.326457       NaN       NaN
50  2.815526e-13  0.000017  0.000148  0.157437  0.000164       NaN
60  2.855494e-13  0.000587  0.005218  0.850358  0.005614  0.730142

How to convert R's Tukey's HSD table into correlation matrix in Python using Pandas

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in CORRELATION

Related Questions in TUKEY

Trending Questions

Popular # Hahtags

Popular Questions