np.array([5.3, 1.2, 76.1, 'Alice', 'Bob', 'Claire'])
I am wondering why this give a dtype of dtype=U32, however the following code gives a dtype of U6.
np.array(['Alice', 'Bob', 'Claire', 5.3, 1.2, 76.1])
np.array([5.3, 1.2, 76.1, 'Alice', 'Bob', 'Claire'])
I am wondering why this give a dtype of dtype=U32, however the following code gives a dtype of U6.
np.array(['Alice', 'Bob', 'Claire', 5.3, 1.2, 76.1])
Copyright © 2021 Jogjafile Inc.
Numpy tries to be efficient when storing datatypes by calculating how many bits it will take to store an object.
Numpy sees
5.3
and puts it into a datatype which is a 32-codepoint data-type due to the datatype conversion rules:When it sees the other strings in the array, they can fit within the 32-codepoint data-type and so it doesn't have to be changed.
Now, consider the second example. Numpy sees
Alice
and puts it into a datatype which can hold six bits. Numpy continues along and sees5.3
, which can also be fit into a 6-codepoint data-type. So no upgrading is required.Similarly, when running
np.array(['Alice', 'Bob', 'Claire', 5.3, 1.2, 76.1, 'Bobby', 2.3000000000001])
it results in aU15
as Numpy sees2.3000000000001
and finds out that the datatype that it is using is not large enough to hold2.3000000000001
and then upgrades it.https://numpy.org/devdocs/reference/arrays.dtypes.html#arrays-dtypes