KDB simple keyed table performance

294 Views Asked by At

I have the following table, about 3m rows. I want to know which approach/attributes I can use the squeeze maximum performance out of it. I will be selecting only on exact matches for inst + time.

price_hist_5min_bars:([inst:`$();time:`timestamp$()]price:`float$())

Also, does the approach change if the table is for reporting only or is changing? This table won't have any rows added to it on-the-fly but I will have another intra-day table with the same structure which will have rows added every 1 or 5 mins. All suggestions appreciated!

2

There are 2 best solutions below

1
On BEST ANSWER

I have tested with a sample database and I can get a slight improvement if I the use `p# attribute:

update `p#inst from `inst`time xasc price_hist_5min_bars

In this case `p# has advantages over `g# as the symbols are in order and not dispersed. See the wiki entry on attributes for more information.

Better performance can be achieved if you restructure the table. Using the following table as an example:

q)n:10000000
q)t:2!update `p#sym from `sym`time xasc ([]sym:n?-100?`3;time:.z.d+n?1D;price:n?1000f)

We can set a benchmark using select:

q)\ts:1000 select from t where sym=`bak,time=2017.11.29D23:59:59.520923942
210 1180496

You can restructure by grouping all of the time and price values for each sym, sorting on time:

q)show r:select `s#time,price by sym from t
sym| time                                                                                        ..
---| --------------------------------------------------------------------------------------------..
aan| `s#2017.11.29D00:00:00.131421536 2017.11.29D00:00:00.214382261 2017.11.29D00:00:00.914720445..
...

Working with this restructured data, we can return the price with:

q)r[`bak][`price] r[`bak][`time]?2017.11.29D23:59:59.520923942
948.3733
q)\ts:1000 r[`bak][`price] r[`bak][`time]?2017.11.29D23:59:59.520923942
4 1824

Where the index of the record within the sym grouping is given by:

q)r[`bak][`time]?2017.11.29D23:59:59.520923942
100638

Obviously the above example only returns and atom and not a table. If you wished to have a table you could try something like this:

q)flip (),/:@[;`sym;:;`bak]flip[r`bak]r[`bak][`time]?2017.11.29D23:59:59.520923942
time                          price    sym
------------------------------------------
2017.11.29D23:59:59.520923942 948.3733 bak
q)\ts:1000 flip (),/:@[;`sym;:;`bak]flip[r`bak]r[`bak][`time]?2017.11.29D23:59:59.520923942
7 2688

But it depends on how you want your output to look.

2
On

If this is in memory i would imagine you want a sorted attribute on the time field and a group attribute on inst field.

If this is for a very specific use case where you only want to extract values of inst+time you could look into creating a composite key of inst and time. The look up of this will be a lot faster as means you could use the unique attribute on that key. However if you're receiving the data in real time you'll also have to account for the processing time to create such a composite key.