I have this script on polars...
import polars as pl
from faker import Faker
fake = Faker(locale='fr_FR')
def my_func():
operation = fake.unique.pystr_format(
string_format='??###',
letters='abcdefghijklmnopqrstuvwxyz').upper()
return operation
my_df = pl.DataFrame({
"Date":[date(2023, 1, 2), date(2023, 1, 2), date(2023, 1, 2),
date(2023, 3, 5), date(2023, 3, 17), date(2023, 3, 17),
date(2023, 3, 17), date(2023, 3, 17), date(2023, 5, 1),
date(2023, 5, 17), date(2023, 5, 17), date(2023, 5, 17)],
"Invoice":['001', '001', '001', '002',
'003', '003', '003', '003',
'004', '005', '005', '005'],
"Amount":[250, 250, 250, 300,
450, 450, 450, 450,
500, 650, 650, 650],
'Operation':['', '', '', '',
'', '', '', '',
'', '', '', '',]
})
my_df.with_columns(
pl.when(pl.col('Invoice') != pl.col('Invoice').shift())
.then(pl.col('Operation').map_elements(lambda x : my_func()))
.otherwise(pl.col('Operation').shift()))
It gives the following result:
| Date | Invoice | Amount | Operation |
|---|---|---|---|
| 2023-01-02 00:00:00 | 001 | 250 | |
| 2023-01-02 00:00:00 | 001 | 250 | |
| 2023-01-02 00:00:00 | 001 | 250 | |
| 2023-03-05 00:00:00 | 002 | 300 | WV068 |
| 2023-03-17 00:00:00 | 003 | 450 | KL429 |
| 2023-03-17 00:00:00 | 003 | 450 | |
| 2023-03-17 00:00:00 | 003 | 450 | |
| 2023-03-17 00:00:00 | 003 | 450 | |
| 2023-05-01 00:00:00 | 004 | 500 | VF254 |
| 2023-05-17 00:00:00 | 005 | 650 | NT977 |
| 2023-05-17 00:00:00 | 005 | 650 | |
| 2023-05-17 00:00:00 | 005 | 650 |
I have a problem with the shift() function -> I can't get this result:
| Date | Invoice | Amount | Operation |
|---|---|---|---|
| 2023-01-02 00:00:00 | 001 | 250 | |
| 2023-01-02 00:00:00 | 001 | 250 | |
| 2023-01-02 00:00:00 | 001 | 250 | |
| 2023-03-05 00:00:00 | 002 | 300 | WV068 |
| 2023-03-17 00:00:00 | 003 | 450 | KL429 |
| 2023-03-17 00:00:00 | 003 | 450 | KL429 |
| 2023-03-17 00:00:00 | 003 | 450 | KL429 |
| 2023-03-17 00:00:00 | 003 | 450 | KL429 |
| 2023-05-01 00:00:00 | 004 | 500 | VF254 |
| 2023-05-17 00:00:00 | 005 | 650 | NT977 |
| 2023-05-17 00:00:00 | 005 | 650 | NT977 |
| 2023-05-17 00:00:00 | 005 | 650 | NT977 |
Can you please help me?
Thank you in advance
It looks like you may want
.forward_fill()instead of.shift()As for the actual "task", it looks like you're trying to group consecutive "runs" of the
Invoicecolumn.Polars has a specific function for that:
.rle_id()