I'm storing ticks with ndarray, each tick has a utc_timestamp[str] as index, tick price/vols as values. Thus I have an array of 2 different dtypes(str and float). This this the way I store it as a np.recarray
data = np.recarray((100,), dtype=[('time':'U23'),('ask1':'f'),('bid1':'f')])
tick = ['2021-04-28T09:38:30.928',14.21,14.2]
# assigning this tick to the end of data, wield
%%timeit
...: data[-1] = np.rec.array(tick)
...:
1.38 ms ± 13.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
tooks 1.38ms per loop!! plus, i can't set the last row using data[-1] = tick which would raise ValueError: setting an array element with a sequence
let's try simple ndarray, say i have 2 seperate arrays, one for str and one for float
%%timeit
...: data[:,-1]=tick[1:]
...:
15.2 µs ± 113 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
see? that's 90x faster! why is that?
My times are quite a bit better:
np.rec.array(tick)creates adtype=[('f0', '<U23'), ('f1', '<f8'), ('f2', '<f8')]). I get better speed if I use the final dtype.A bulk of that time is creating the 1 term recarray:
Making a structured array instead:
So skipping
recarrayentirely:I think
recarrayhas largely been replaced by structured array. The main thing recarray adds is the ability to address fields as attributesYour example shows that
recarrayslows things down.edit
The tuple tick can be assigned directly without extra conversion: