I am trying to understand the logic behind pd.to_numeric float downcasting. I'm looking for the specific condition(s) that are used.
I was hoping / expecting that it would preserve the uniqueness of the values, but in my example, it does not.
The docs do not explain the logic for any of the downcasting options: float, integer and unsigned. I would love to understand them all.
import pandas as pd
import numpy as np
s = pd.Series(np.random.uniform(0, 1, 100_000), dtype="Float64")
s_float32 = pd.to_numeric(s, downcast="float")
print(s_float32.dtype)
print(s_float32.nunique() == s.nunique())
Float32
False
You can read the source of
pd.to_numeric. The code responsible to downcast is hereThe key is:
From the documentation, letters stand for:
float16float32float64float128The algorithm is to try dtype from the smallest to the largest.
Update
And yet it's all there. The code resolution looks like:
Output: