I'm trying to learn rust
and using polars
. I've a simple CSV file
names,pdate,orders
alice,2023-02-12,2
alice,2023-02-18,1
alice,2023-02-22,6
bob,2022-12-10,1
bob,2022-12-14,1
bob,2022-12-30,4
I read it in using
let mut df = CsvReader::from_path("t2.csv")
.unwrap()
.has_header(true)
.finish()
.unwrap();
println!("{}", df);
and it prints out the result as expected. However, I want to cast the column pdate
into a date to do further date arthimetic with it. I tried the solution here by doing so
let dt_options = StrpTimeOptions {
date_dtype: DataType::Date,
fmt: Some("%Y-%m-%d".into()),
..Default::default()
};
let df = df.with_column(col("pdate").str().strptime(dt_options));
A cargo check
gave the following error
Checking test v0.1.0 (/home/xxxx/a1/rustp)
error[E0277]: the trait bound `Expr: IntoSeries` is not satisfied
--> test.rs:37:29
|
37 | let df = df.with_column(col("pdate").str().strptime(dt_options));
| ----------- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `IntoSeries` is not implemented for `Expr`
| |
| required by a bound introduced by this call
|
= help: the following other types implement trait `IntoSeries`:
Arc<(dyn SeriesTrait + 'static)>
ChunkedArray<T>
Logical<DateType, Int32Type>
Logical<DatetimeType, Int64Type>
Logical<DurationType, Int64Type>
Logical<TimeType, Int64Type>
polars::prelude::Series
This appears to be a fairly basic functionality, but I've not been able to find a straight solution to this. Any help would be appreciated.
EDIT:
The following code works. But it has a new problem, I'm trying to find the difference between two date columns in days as a float, but it comes out as Duration
let df2 = df
.clone()
.lazy()
.with_column(col("pdate").str().strptime(dt_options).alias("dt_pdate"))
.groupby(["names"])
.agg([
col("dt_pdate").shift(1).alias("prev_date"),
col("orders"),
col("dt_pdate"),
])
.explode(["prev_date", "orders", "dt_pdate"])
.select([all(), (col("dt_pdate") - col("prev_date")).alias("delta")])
.collect()
.unwrap();
At first glance it looks like your df is a DataFrame, not a LazyFrame. You can get a LazyFrame from a DataFrame with
df.lazy()
and a DataFrame from a LazyFrame withlazy_df.collect()
.