How would I generate combinations of items within Polars using the native expression API?

218 Views Asked by At

Is there a way to generate combinations of items within a list inside a Polars column without resorting to .map_elements() + itertools for each row?

This is my current solution:

import polars as pl
import itertools

(pl.DataFrame({'col': [['a', 'b', 'c']]})
   .with_columns(pl.col('col')
                   .map_elements(lambda list_o_things: [sorted((thing_1, thing_2))
                                                        for thing_1, thing_2 
                                                        in itertools.combinations(list_o_things, 2)])
                )
)

which returns this:

[['a', 'b'], ['a', 'c'], ['b', 'c']]

1

There are 1 best solutions below

3
On BEST ANSWER

Explode the nested structure, do a cross join with itself, filter out the redundant entries, concat to list, and implode to nested list.

df=pl.DataFrame({'col': [['a', 'b', 'c']]})
(
    df
    .explode('col')
    .join(
        df.explode('col'), how='cross')
    .filter(pl.col('col')<pl.col('col_right'))
    .select(pl.concat_list('col','col_right').implode())
    )
shape: (1, 1)
┌──────────────────────────────────────┐
│ col                                  │
│ ---                                  │
│ list[list[str]]                      │
╞══════════════════════════════════════╡
│ [["a", "b"], ["a", "c"], ["b", "c"]] │
└──────────────────────────────────────┘