I just wanted to confirm if the default data type for string is unicode while creating a ndarray. I could not find any reference which states this clearly. May be it is too obvious and doesn't need stating.
When dtype is specified:
>>> import numpy as np
>>> g = np.array([['a', 'b'],['c', 'd']], dtype='S')
>>> g
array([[b'a', b'b'],
[b'c', b'd']],
dtype='|S1')
Without specifying the dtype:
>>> g = np.array([['a', 'b'],['c', 'd']])
>>> g
array([['a', 'b'],
['c', 'd']],
dtype='<U1')
Also, what does the literal b indicate when dtype is specified. As per the documentation, it indicates bool which doesn't seem to be the case here.
Can some one please clarify?
b'...'means it's a byte-string and the default dtype for arrays of strings depends on the kind of strings. Unicodes (python 3 strings are unicode) areUand Python 2stror Python 3byteshave the dtypeS. You can find the explanation of dtypes in the NumPy documentation hereHowever in your first case you actually forced NumPy to convert it to bytes because you specified
dtype='S'.