Join Polar data frame with varying multiple similar columns

164 Views Asked by At

I am using polar library in python and have two data frames the look like this

import polars as pl

data1 = {
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
}

df1 = pl.DataFrame(data1)

data2 = {
    'B': [4, 5, 6],
    'C': [7, 8, 9],
    'D': [1, 2, 3]
}

df2 = pl.DataFrame(data2)

# the column B and C are same in both data frames
# TODO: Join/Concat the data frames into one.

The data2 can vary some time it can have 2 common columns, some time it can have 1 common column and some times more. and The result should look like. I was wondering if there is any built function, or some kind of flag in a function that exists already in polars, that i can use.

result = {
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9],
    'D': [1, 2, 3]
}

I am not quite sure how to join or concat the polar data frames in order to achieve this.

1

There are 1 best solutions below

1
On BEST ANSWER

You can concat with how="align" but the resulting column order differs.

pl.concat([df1, df2], how="align")
shape: (3, 4)
┌─────┬─────┬─────┬─────┐
│ B   ┆ C   ┆ A   ┆ D   │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╪═════╡
│ 4   ┆ 7   ┆ 1   ┆ 1   │
│ 5   ┆ 8   ┆ 2   ┆ 2   │
│ 6   ┆ 9   ┆ 3   ┆ 3   │
└─────┴─────┴─────┴─────┘

You can see how it is implemented here.

It basically finds the common columns to use as on and "outer joins" all the frames together.

dfs = [df1, df2]

cols = pl.concat(pl.Series(df.columns) for df in dfs)
join_cols = list(cols.filter(cols.is_duplicated()).unique())

result = dfs[0].join(dfs[1], how="outer", on=join_cols, suffix="")

# only needed for more than 2 frames
# for df in dfs[2:]:
#    result = result.join(df, how="outer", on=join_cols, suffix="")

result = result.select(*cols.unique(maintain_order=True))
shape: (3, 4)
┌─────┬─────┬─────┬─────┐
│ A   ┆ B   ┆ C   ┆ D   │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╪═════╡
│ 1   ┆ 4   ┆ 7   ┆ 1   │
│ 2   ┆ 5   ┆ 8   ┆ 2   │
│ 3   ┆ 6   ┆ 9   ┆ 3   │
└─────┴─────┴─────┴─────┘