How to expand multiple list column in rows in python

78 Views Asked by At

I have a DataFrame like this:

ID MS DS
654 1500,10000,20000,30000 60,365,730
131 1500,10000,20000 60,365,730
598 1500,10000,20000,30000 60,365,730

The desired output looks like this:

ID MS DS
654 1500 60
654 10000 365
654 20000 730
654 30000 Nan
131 1500 60
131 10000 365
131 20000 730
598 1500 60

I have tried the following code:

from itertools import zip_longest

df.apply(lambda row: list(zip_longest(row['MS'], 
                                  row['DS'], 
                                  fillvalue = 'Nan')),axis = 1)

This did not work as expected, as it creates a Series with list values like this: [('1500,10000,20000,30000', '60,365,730')], [('1500,10000,20000', '60,365,730')].

2

There are 2 best solutions below

0
ouroboros1 On BEST ANSWER

Here's one approach:

cols = ['MS', 'DS']

dfs = [df[c].explode().str.split(',', expand=True)
       .astype('Int64')
       .stack()
       .rename(c) 
       for c in cols]

out = (pd.concat(dfs, axis=1)
       .droplevel(1)
       .join(df['ID'])
       .loc[:, df.columns]
       )

out

    ID     MS    DS
0  654   1500    60
0  654  10000   365
0  654  20000   730
0  654  30000  <NA>
1  131   1500    60
1  131  10000   365
1  131  20000   730
2  598   1500    60
2  598  10000   365
2  598  20000   730
2  598  30000  <NA>

Explanation

  1. Inside a list comprehension, for each column of ['MS', 'DS']:
    • use Series.explode to get list elements into multiple rows;
    • use Series.str.split to split on ',' and set expand to True to get the result in multiple columns;
    • chain Series.astype with Int64 to turn what will be string values into proper integers, while allowing for NaN values (the shorter sequences will have these for columns not "reached");
    • apply df.stack to get a pd.Series result (with column values now as a second level index);
    • add Series.rename to restore the original column name.
  2. Join both Series, add non-exploded column, and re-order:
    • use pd.concat along axis=1 (columns)
    • apply df.droplevel to drop the added index level;
    • use df.join to re-add the non-exploded column ID;
    • pass df.columns to df.loc to reset the column order.

Data used

import pandas as pd

data = {
    'ID': [654, 131, 598],
    'MS': [['1500,10000,20000,30000'], ['1500,10000,20000'], 
           ['1500,10000,20000,30000']],
    'DS': [['60,365,730'], ['60,365,730'], 
           ['60,365,730']]
}

df = pd.DataFrame(data)
df

    ID                        MS            DS
0  654  [1500,10000,20000,30000]  [60,365,730]
1  131        [1500,10000,20000]  [60,365,730]
2  598  [1500,10000,20000,30000]  [60,365,730]
0
Bushmaster On

You can use str.split and explode:

df[["MS","DS"]] = df[["MS","DS"]].apply(lambda x: x.str.split(","),axis=1)
df["final"] = df[["MS","DS"]].apply(lambda x: list(zip_longest(x["MS"], x["DS"])),axis=1)
df = df.explode("final")
df[["MS","DS"]] = df["final"].values.tolist()

Out:

    ID     MS    DS
0  654   1500    60
0  654  10000   365
0  654  20000   730
0  654  30000  None
1  131   1500    60
1  131  10000   365
1  131  20000   730
2  598   1500    60
2  598  10000   365
2  598  20000   730
2  598  30000  None