I have a polars dataframe with columns a_0, a_1, a_2, b_0, b_1, b_2. I want to convert it to a longer and thinner dataframe (3 x rows, but just 2 columns a and b), so that a contains a_0[0], a_1[0], a_2[0], a_0[1], a_1[1], a_2[1],... and the same for b. How can I do that?
merge groups of columns in a polars dataframe to single columns
114 Views Asked by DeltaIV At
3
There are 3 best solutions below
1
On
You can unpivot the dataframe from wide to long separately for each column using pl.DataFrame.melt and subsequently concatenate the results using pl.concat.
import polars.selectors as cs
pl.concat(
[
df.melt(value_vars=cs.starts_with(x), value_name=x).drop("variable")
for x in ["a", "b"]
],
how="horizontal"
)
Note that I've used column selectors to select all column starting with "a" / "b".
Output.
shape: (3, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═════╡
│ 0 ┆ a │
│ 1 ┆ b │
│ 2 ┆ c │
└─────┴─────┘
2
On
The pattern I've been using for this is:
- Use a bare
.melt()to go from wide to long - Clean the "names" to produce the new column names
- Enumerate each "group" to give an index/row_number
- Re-pivot
df = pl.DataFrame(dict(a_0=1, a_1=2, a_2=3, b_0=4, b_1=5, b_2=6))
(df.melt()
.with_columns(pl.col("variable").str.replace("_.*", ""))
.with_columns(index = pl.int_range(pl.len()).over("variable"))
)
shape: (6, 3)
┌──────────┬───────┬───────┐
│ variable ┆ value ┆ index │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞══════════╪═══════╪═══════╡
│ a ┆ 1 ┆ 0 │
│ a ┆ 2 ┆ 1 │
│ a ┆ 3 ┆ 2 │
│ b ┆ 4 ┆ 0 │
│ b ┆ 5 ┆ 1 │
│ b ┆ 6 ┆ 2 │
└──────────┴───────┴───────┘
Which we then .pivot()
(df.melt()
.with_columns(pl.col("variable").str.replace("_.*", ""))
.with_columns(index = pl.int_range(pl.len()).over("variable"))
.pivot(index="index", columns="variable", values="value")
)
shape: (3, 3)
┌───────┬─────┬─────┐
│ index ┆ a ┆ b │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═══════╪═════╪═════╡
│ 0 ┆ 1 ┆ 4 │
│ 1 ┆ 2 ┆ 5 │
│ 2 ┆ 3 ┆ 6 │
└───────┴─────┴─────┘
You can use
concat_list()to join the columns you want together and then useexplode()to convert them into rows.Let's take simple data frame as an example:
Now, you can reshape it. First, concat the columns into lists and rename the columns for the final result:
No, explode lists into rows: