Add series to a polars dataframe by cycling to match the dataframe row count

64 Views Asked by At

This:

df = polars.DataFrame(dict(
  j=numpy.random.randint(10, 99, 10)
  ))
print('--- df')
print(df)

s = polars.Series('k', numpy.random.randint(10, 99, 3))
print('--- s')
print(s)

dfj = (df
  .with_row_count()
  .with_columns(
    polars.col('row_nr') % len(s)
    )
  .join(s.to_frame().with_row_count(), on='row_nr')
  .drop('row_nr')
  )
print('--- dfj')
print(dfj)

produces:

--- df
 j (i64)
 47
 22
 82
 19
 85
 15
 89
 74
 26
 11
shape: (10, 1)
--- s
shape: (3,)
Series: 'k' [i64]
[
        86
        81
        16
]
--- dfj
 j (i64)  k (i64)
 47       86
 22       81
 82       16
 19       86
 85       81
 15       16
 89       86
 74       81
 26       16
 11       86
shape: (10, 2)

That is, it cycles series 'k' as needed to match the dataframe row count.

It looks a bit verbose. Is there a shorter (or more idiomatic) way to do this in polars?

1

There are 1 best solutions below

0
On BEST ANSWER

You could simplify your current approach a little by passing on= an expression.

df.join(
   s.to_frame(),
   on = pl.int_range(0, pl.count()).mod(s.len()),
)
shape: (10, 3)
┌─────┬─────┬─────┐
│ j   ┆ int ┆ k   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 13  ┆ 0   ┆ 70  │
│ 75  ┆ 1   ┆ 97  │
│ 39  ┆ 2   ┆ 55  │
│ 74  ┆ 0   ┆ 70  │
│ 56  ┆ 1   ┆ 97  │
│ 52  ┆ 2   ┆ 55  │
│ 37  ┆ 0   ┆ 70  │
│ 69  ┆ 1   ┆ 97  │
│ 76  ┆ 2   ┆ 55  │
│ 40  ┆ 0   ┆ 70  │
└─────┴─────┴─────┘

.int_range() adds an int column which would need to be dropped.

If calling dict() on the Series is okay, you could .map_dict

df.with_columns(k =
   pl.first().cumcount().mod(s.len()).map_dict(dict(enumerate(s))
)