Collapse a dataframe based on virtual grid

41 Views Asked by At

I am a beginner in pandas and need some help I have the following dummy data

raw_data = {
   "Unnamed: 0" : ["Index_with_NaNs", 1., np.nan, 2., np.nan, np.nan, 3., np.nan, np.nan, np.nan],
    "A" : ['Ali', 'a', 'e', 'i', np.nan, 'q', 'u', 'y', 'c3', 'g7'],
    "B" : ['Bob', 'b', 'f', 'j', 'n', 'r', 'v', 'z', 'd4', 'h8'],
    "C" : ['Cha', 'c', 'g', 'k', 'o', 's', np.nan, 'a1', np.nan, 'i9'],
    "D" : ['Den', 'd', 'h', 'l', 'p', 't', 'x', np.nan, 'f6', 'j10']    
}

raw_df = pd.DataFrame(raw_data)
display(raw_df)

For each column in A, B, C, D, I want to keep concatenating(using a space separator) the values within each of the columns vertically until the next non-NaN integer value is found within the "Unnamed: 0" column. Note that my raw_df contains NaN values'

My virtual grid:

My virtual grid

The end result will look like this :

Final datafram

Final dataframe

I created the boolean series I wanted, but need some help "sectioning" the dataframe

# Convert to string type and split indices based on the regex pattern
split_indices = raw_df["Unnamed: 0"].astype(str).str.match(r"[1-9]{1}\.0")
1

There are 1 best solutions below

0
Andrej Kesely On

Try:

raw_df.columns = [
    f"{c} {v}" for c, v in zip(raw_df.columns, raw_df.iloc[0].astype(str))
]
raw_df = raw_df[1:]

raw_df = (
    raw_df.groupby(raw_df[raw_df.columns[0]].notna().cumsum())
    .agg(lambda s: " ".join(s[s.notna()].astype(str)))
    .reset_index(drop=True)
)

print(raw_df)

Prints:

  Unnamed: 0 Index_with_NaNs      A Ali      B Bob  C Cha     D Den
0                        1.0        a e        b f    c g       d h
1                        2.0        i q      j n r  k o s     l p t
2                        3.0  u y c3 g7  v z d4 h8  a1 i9  x f6 j10