Polars Dataframe change null to np.nan in Int row when use .to_numpy()

463 Views Asked by At

In polars, we can use .to_numpy() to change a polars.DataFrame into numpy.ndarray. But if there are None value, polars will change them into null, when use to_numpy(), null value will be change to np.NaN, thus change int array into float array.

import polars as pl

t = pl.DataFrame({
    'a': [1, 4, 3, 5],
    'b': [3, 6, 2, None],
})

print(t.to_numpy())

[[ 1. 3.] [ 4. 6.] [ 3. 2.] [ 5. nan]]

How can i avoid this, and change null to None when i want change DataFrame to ndarray?

1

There are 1 best solutions below

0
On BEST ANSWER

numpy doesn't support null for float types, so you can't.

If you really need None in a numpy array, you could cast to pl.Object first:

In [42]: import polars as pl
    ...:
    ...: t = pl.DataFrame({
    ...:     'a': [1, 4, 3, 5],
    ...:     'b': [3, 6, 2, None],
    ...: }, schema_overrides={'b': pl.Object})
    ...:
    ...: print(t.to_numpy())
[[1 3]
 [4 6]
 [3 2]
 [5 None]]

But numpy doesn't really handle missing data, I'd suggest you first impute your missing data and then convert to numpy