rust polars convert string column to datetime

720 Views Asked by At

I'm trying to learn rust and using polars. I've a simple CSV file

names,pdate,orders
alice,2023-02-12,2
alice,2023-02-18,1
alice,2023-02-22,6
bob,2022-12-10,1
bob,2022-12-14,1
bob,2022-12-30,4

I read it in using

    let mut df = CsvReader::from_path("t2.csv")
        .unwrap()
        .has_header(true)
        .finish()
        .unwrap();
    println!("{}", df);

and it prints out the result as expected. However, I want to cast the column pdate into a date to do further date arthimetic with it. I tried the solution here by doing so

    let dt_options = StrpTimeOptions {
        date_dtype: DataType::Date,
        fmt: Some("%Y-%m-%d".into()),
        ..Default::default()
    };

    let df = df.with_column(col("pdate").str().strptime(dt_options));

A cargo check gave the following error

    Checking test v0.1.0 (/home/xxxx/a1/rustp)
error[E0277]: the trait bound `Expr: IntoSeries` is not satisfied
    --> test.rs:37:29
     |
37   |     let df = df.with_column(col("pdate").str().strptime(dt_options));
     |                 ----------- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `IntoSeries` is not implemented for `Expr`
     |                 |
     |                 required by a bound introduced by this call
     |
     = help: the following other types implement trait `IntoSeries`:
               Arc<(dyn SeriesTrait + 'static)>
               ChunkedArray<T>
               Logical<DateType, Int32Type>
               Logical<DatetimeType, Int64Type>
               Logical<DurationType, Int64Type>
               Logical<TimeType, Int64Type>
               polars::prelude::Series

This appears to be a fairly basic functionality, but I've not been able to find a straight solution to this. Any help would be appreciated.

EDIT: The following code works. But it has a new problem, I'm trying to find the difference between two date columns in days as a float, but it comes out as Duration

    let df2 = df
        .clone()
        .lazy()
        .with_column(col("pdate").str().strptime(dt_options).alias("dt_pdate"))
        .groupby(["names"])
        .agg([
            col("dt_pdate").shift(1).alias("prev_date"),
            col("orders"),
            col("dt_pdate"),
        ])
        .explode(["prev_date", "orders", "dt_pdate"])
        .select([all(), (col("dt_pdate") - col("prev_date")).alias("delta")])
        .collect()
        .unwrap();

1

There are 1 best solutions below

2
On

At first glance it looks like your df is a DataFrame, not a LazyFrame. You can get a LazyFrame from a DataFrame with df.lazy() and a DataFrame from a LazyFrame with lazy_df.collect().