how to convert the format of a dataframe from two rows to single

77 Views Asked by At

I have a pandas dataframe with 16 columns and 2 rows (sec01 to sec16 and x_data, y_data)

I want to convert this to another dataframe with 1 row and 32 columns(x_sec01, y_sec01 to x_sec16, y_sec16)

       sec01    sec02   sec03   sec04   sec05   sec06   sec07   sec08   sec09   sec10   sec11   sec12   sec13   sec14   sec15   sec16
x_data  -0  -0  -0  -0  -0  -0  -0  -0  -0  0   0   0   0.557336    0.604719    0.733460    1.019466
y_data  0   0   0   0   0   0   0   0   0   0   0   0   3.212005    2.692938    2.650384    3.085967

how can this be done in simplest way ?

3

There are 3 best solutions below

1
On BEST ANSWER

reconstruct with pd.DataFrame and reshaping underlying array

pd.DataFrame(
    df.values.T.reshape(1, -1),
    columns=['{}_{}'.format(r, c)
             for c in df.columns.tolist()
             for r in df.index.str[0].tolist()])

enter image description here

0
On

You can first create MultiIndex from index with unstack:

df.index = df.index.str.split('_', expand=True)
df1 = df.unstack(0)
df1.columns = ['_'.join((x[1], x[0])) for x in df1.columns]
print (df1)
      x_sec01  y_sec01  x_sec02  y_sec02  x_sec03  y_sec03  x_sec04  y_sec04  \
data        0        0        0        0        0        0        0        0   

      x_sec05  y_sec05    ...     x_sec12  y_sec12   x_sec13   y_sec13  \
data        0        0    ...           0        0  0.557336  3.212005   

       x_sec14   y_sec14  x_sec15   y_sec15   x_sec16   y_sec16  
data  0.604719  2.692938  0.73346  2.650384  1.019466  3.085967  

[1 rows x 32 columns]

Another solution with concat:

df1 = pd.concat([df.loc['x_data'],df.loc['y_data']],keys=('x', 'y')).to_frame().sort_index(1)
df1.index = ['_'.join(x) for x in df1.index]
print (df1.T)
   x_sec01  y_sec01  x_sec02  y_sec02  x_sec03  y_sec03  x_sec04  y_sec04  \
0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0   

   x_sec05  y_sec05    ...     x_sec12  y_sec12   x_sec13   y_sec13   x_sec14  \
0      0.0      0.0    ...         0.0      0.0  0.557336  3.212005  0.604719   

    y_sec14  x_sec15   y_sec15   x_sec16   y_sec16  
0  2.692938  0.73346  2.650384  1.019466  3.085967  

[1 rows x 32 columns]
0
On

The crux of the solution lies in using numpy.reshape by providing Fortran-like index ordering (order='F') to get your data in the required shape.

After that, it's mere formatting the headers of the DF obtained by stacking the index and column labels depth-wise using numpy.dstack.

idx = df.index.str[0]
col = df.columns
head = np.dstack(('{}_'.format(idx[0]) + col, '{}_'.format(idx[1]) + col)).ravel()
pd.DataFrame(df.values.reshape(1, 32, order='F'), columns=head)

produces the following output:

   x_sec01  y_sec01  x_sec02  y_sec02  x_sec03  y_sec03  x_sec04  y_sec04  \
0      0.0      0.0      0.0      0.0      0.0      0.0      0.0      0.0   

   x_sec05  y_sec05    ...     x_sec12  y_sec12   x_sec13   y_sec13   x_sec14  \
0      0.0      0.0    ...         0.0      0.0  0.557336  3.212005  0.604719   

    y_sec14  x_sec15   y_sec15   x_sec16   y_sec16  
0  2.692938  0.73346  2.650384  1.019466  3.085967  

[1 rows x 32 columns]