I have a DataFrame like this:
| ID | MS | DS |
|---|---|---|
| 654 | 1500,10000,20000,30000 | 60,365,730 |
| 131 | 1500,10000,20000 | 60,365,730 |
| 598 | 1500,10000,20000,30000 | 60,365,730 |
The desired output looks like this:
| ID | MS | DS |
|---|---|---|
| 654 | 1500 | 60 |
| 654 | 10000 | 365 |
| 654 | 20000 | 730 |
| 654 | 30000 | Nan |
| 131 | 1500 | 60 |
| 131 | 10000 | 365 |
| 131 | 20000 | 730 |
| 598 | 1500 | 60 |
I have tried the following code:
from itertools import zip_longest
df.apply(lambda row: list(zip_longest(row['MS'],
row['DS'],
fillvalue = 'Nan')),axis = 1)
This did not work as expected, as it creates a Series with list values like this: [('1500,10000,20000,30000', '60,365,730')], [('1500,10000,20000', '60,365,730')].
Here's one approach:
Explanation
['MS', 'DS']:Series.explodeto get list elements into multiple rows;Series.str.splitto split on','and setexpandtoTrueto get the result in multiple columns;Series.astypewithInt64to turn what will be string values into proper integers, while allowing forNaNvalues (the shorter sequences will have these for columns not "reached");df.stackto get apd.Seriesresult (with column values now as a second level index);Series.renameto restore the original column name.Series, add non-exploded column, and re-order:pd.concatalongaxis=1(columns)df.droplevelto drop the added index level;df.jointo re-add the non-exploded columnID;df.columnstodf.locto reset the column order.Data used