I have the following dataframe (pandas version 0.13.1)
>>> import pandas as pd
>>> DF = pd.DataFrame({'Group':['G1','G1','G2','G2'],'Start':['10','10','12','13'],'End':['13','13','14','15'],'Sample':['S1','S2','S3','S3'],'Status':['yes','yes','no','yes'],'pValue':[0.13,0.12,0.96,0.76],'pValueString':['13/100','12/100','96/100','76/100'],'desc':['aaaaaa','bbbbbb','aaaaaa','cccccc']})
>>> DF
End Group Sample Start Status pValue pValueString desc
0 13 G1 S1 10 yes 0.13 13/100 aaaaaa
1 13 G1 S2 10 no 0.12 12/100 bbbbbb
2 14 G2 S3 12 no 0.96 96/100 aaaaaa
3 15 G2 S3 13 yes 0.76 76/100 cccccc
[4 rows x 8 columns]
To the dataframe above
- I would like to groupby 'Group'.
- Then groupby a Start-End couplet.
- Pivot the sample values for each group. aggregate by max(pValue)
- Get the corresponding Status, desc corresponding to the sample with the higher pvalue and replace its value with a pValueString.
I need to ultimately get this to the following format
Group Start End Sample Status desc
S1 S2
G1 10 13 13/100 12/100 yes aaaaaa
S3
G2 12 14 96/100 no aaaaaa
13 15 76/100 yes cccccc
I have tried to use pivot_table and groupby but to no avail. Any help would be much appreciated.
I have
grouped=DF.groupby('Group')
for g,v in grouped: pandas.pivot_table(data=v,values=['pValue','pValueString']),rows=['Group','Start','End'],cols=['Sample'])['pValueString']
How do I get the corresponding desc and Status?
For pandas pivot table, you pass the rows you want as
index
and the columns you want ascolums
:Then for your