Any advantage to switching between dtypes before/after non-streaming operations in Polars (for larger than memory data)?

31 Views Asked by Thomas At 01 March 2024 at 21:12

Is there any advantage to switching dtypes during your process? Such as: str -> categorical (mem saving) -> str (operation) -> categorical in order to make a non-streaming operation fit into memory?

Or does just converting to categorical at the end of your operations do the same thing when it can? I'm dealing with larger than memory datasets and some of the operations aren't supported by streaming yet (.concat_list), so I want to keep things as tiny as possible during writing to file (.collect(streaming=True).write_parquet()) because sometimes my lists or strings are bigger than available memory.

Along with that, will sorting my df mid-workflow (before a non-streaming operation) reduce the memory usage?

Original Q&A

Any advantage to switching between dtypes before/after non-streaming operations in Polars (for larger than memory data)?

There are 0 best solutions below

Related Questions in BATCH-PROCESSING

Related Questions in PYTHON-POLARS

Related Questions in DTYPE

Related Questions in ORDER-OF-EXECUTION

Trending Questions

Popular # Hahtags

Popular Questions