Edit for clarity
I have a set of dataframes that I need to combine programatically. Each one represents a particular participant over a set of treatment sessions. Each frame comes from 1 sheet of an excel workbook (one workbook per client, 1 worksheet per psychometric instrument) which I have imported into pandas.
The data frame below is an example of Participant a, instrument a:
| | beaseline | time 1 | time 1 change | time 1 change from baseline | time 2 | time 2 change | time 2 change from baseline |
|--------- |----------- |-------- |--------------- |----------------------------- |-------- |--------------- |----------------------------- |
| item 1 | 3 | 4 | Nan | 1 | 4 | 0 | 1 |
| item 2 | 3 | 2 | Nan | 0 | 3 | 1 | 0 |
| item 3 | 1 | 1 | Nan | 0 | 2 | 1 | 1 |
| item 4 | 1 | 1 | Nan | 0 | 1 | 0 | 0 |
| item 5 | 1 | 2 | Nan | 1 | 1 | -1 | 0 |
I need to combine this with observations from other participants in a tabular dataframe such that each row is a participant and each column is a reponse to an item at a particular time point.*
So I need to reshaped this data into something like this:
| | Baseline item 1 | Baseline item 2 | Baseline item 3 | Baseline item 4 | Baseline item 5 | time 1 item 1 | time 1 item 2 |
|--- |----------------- |----------------- |----------------- |----------------- |----------------- |--------------- |--------------- |
| 0 | 3 | 3 | 1 | 1 | 1 | 4 | 2 |
Once I have done this reshaping for each dataframe I can then append them together and add the client code as the first column, thereby distinguishing each participant.
| | Client code | Baseline item 1 | Baseline item 2 | Baseline item 3 | Baseline item 4 | Baseline item 5 | time 1 item 1 | time 1 item 2 |
|--- |------------- |----------------- |----------------- |----------------- |----------------- |----------------- |--------------- |--------------- |
| 0 | xxi212121 | 3 | 3 | 1 | 1 | 1 | 4 | 2 | | 4 | 2 |
I have already used a lot of repeated code and essentially manually pulled each slice as a series, renamed it and then created a new dataframe:
baseline = pd.Dataframe(raw_data.iloc[2:,1])
time_1 = pd.Dataframe(raw_data.iloc[2:,2])
but obviously this is a really silly solution.
I need to able to repeat this across a lot of similar dataframes so I need a more useful solution, Any help would be really appreciated.
Create and flatten the
MultiIndex
columns, then withnp.hstack
horizontally stack all the rows in the dataframe and create a new dataframe fromstacked
rows with the flattened columns:Example (column names removed for simplicity):