This:
df = polars.DataFrame(dict(
j=numpy.random.randint(10, 99, 10)
))
print('--- df')
print(df)
s = polars.Series('k', numpy.random.randint(10, 99, 3))
print('--- s')
print(s)
dfj = (df
.with_row_count()
.with_columns(
polars.col('row_nr') % len(s)
)
.join(s.to_frame().with_row_count(), on='row_nr')
.drop('row_nr')
)
print('--- dfj')
print(dfj)
produces:
--- df
j (i64)
47
22
82
19
85
15
89
74
26
11
shape: (10, 1)
--- s
shape: (3,)
Series: 'k' [i64]
[
86
81
16
]
--- dfj
j (i64) k (i64)
47 86
22 81
82 16
19 86
85 81
15 16
89 86
74 81
26 16
11 86
shape: (10, 2)
That is, it cycles series 'k' as needed to match the dataframe row count.
It looks a bit verbose. Is there a shorter (or more idiomatic) way to do this in polars?
You could simplify your current approach a little by passing
on=
an expression..int_range()
adds anint
column which would need to be dropped.If calling
dict()
on the Series is okay, you could.map_dict