Is there a more scalable sub-select alternative to this query?

237 Views Asked by At

I have a large table that contains a date field of type datetime. As part of a function that takes as input two lists of datetime namely a list of afroms and atos I'd like to compute for each of these afrom,ato pairs all the rows of the large table whose date is between them.

I worked out a not very efficient way to do this i.e. it has a serious scalability drawback:

/ t1 contains my afrom,ato pairs 
q)t1:([] afrom:`datetime$(2017.10.01T10:00:00.000 2017.10.02T10:00:00.000);ato:`datetime$(2017.10.01T12:00:00.000 2017.10.02T12:00:00.000));
q)t1
afrom                   ato                    
-----------------------------------------------
2017.10.01T10:00:00.000 2017.10.01T12:00:00.000
2017.10.02T10:00:00.000 2017.10.02T12:00:00.000

/ t2 contains my very very large dataset  
q)t2:([] date:`datetime$(2017.10.01T10:01:00.000 2017.10.01T10:02:00.000 2017.10.01T10:03:00.000 2017.10.02T10:01:00.000 2017.10.02T10:02:00.000 2017.10.02T10:03:00.000); ccypair:(3#`EURUSD),(3#`USDCHF); mid:6?1.05);
q)t2
date                    ccypair mid                 
----------------------------------------------------
2017.10.01T10:01:00.000 EURUSD  0.24256133290473372 
2017.10.01T10:02:00.000 EURUSD  0.091602176288142809
2017.10.01T10:03:00.000 EURUSD  0.10756538207642735 
2017.10.02T10:01:00.000 USDCHF  0.91046513157198206 
2017.10.02T10:02:00.000 USDCHF  0.76424539103172717 
2017.10.02T10:03:00.000 USDCHF  0.17090452200500295

Then I can use cross like this:

select from (t1 cross t2) where afrom<date,date<ato

and this produces the correct results:

afrom                   ato                     date                    ccypa..
-----------------------------------------------------------------------------..
2017.10.01T10:00:00.000 2017.10.01T12:00:00.000 2017.10.01T10:01:00.000 EURUS..
2017.10.01T10:00:00.000 2017.10.01T12:00:00.000 2017.10.01T10:02:00.000 EURUS..
2017.10.01T10:00:00.000 2017.10.01T12:00:00.000 2017.10.01T10:03:00.000 EURUS..
2017.10.02T10:00:00.000 2017.10.02T12:00:00.000 2017.10.02T10:01:00.000 USDCH..
2017.10.02T10:00:00.000 2017.10.02T12:00:00.000 2017.10.02T10:02:00.000 USDCH..
2017.10.02T10:00:00.000 2017.10.02T12:00:00.000 2017.10.02T10:03:00.000 USDCH..

However, when I have a large list of afroms and atos the cross will "unnecessarily" expand the potentially large table t2 times the size of t1 and that doesn't scale well.

Is there a better way to do this? e.g. I have tried things like:

select from t2 where (exec afrom from t1)<date,date<(exec ato from t1)
error: `length

I would need to do a loop I guess but not sure how. Subquestion .. is it possible to have a sigle list of interval tuples i.e. intervals(afrom;ato) instead of having separated afroms and atos?

1

There are 1 best solutions below

7
On BEST ANSWER

If I understand this correctly then one way to do this would be use each row of t1 and find where any row falls within each time range:

select from t2 where any date within/:value each t1
date                    ccypair mid
-----------------------------------------
2017.10.01T10:01:00.000 EURUSD  0.41239
2017.10.01T10:02:00.000 EURUSD  0.5429457
...

Which is the same as yoru example output above, without the afrom and ato columns. In this example within selects all values within a range and any and each-right /: allow you to use multiple ranges. If this does not scale particularly well for you then you can work on each row of t1 individually:

raze{[x;y]select from x where date within value y}[t2]'[t1]

Which should work if the windows do not overlap.

If you need exclusive or then you could try modifying the window times slightly to ensure they are excluded:

q)select from t2 where any date within/:value each @[t1;`afrom`ato;+;1 -1*00:00:00.001]
date                    ccypair mid
-----------------------------------------
2017.10.01T10:01:00.000 EURUSD  0.41239
2017.10.01T10:02:00.000 EURUSD  0.5429457
...

To add the afrom and ato columns to the output you can cross them with the selected rows:

raze{[x;y]flip[1#'y]cross select from x where date within value[y]+1 -1*00:00:00.001}[t2]'[t1]
afrom                   ato                     date                    ccypair mid
-----------------------------------------------------------------------------------------
2017.10.01T10:00:00.000 2017.10.01T12:00:00.000 2017.10.01T10:01:00.000 EURUSD  0.41239
2017.10.01T10:00:00.000 2017.10.01T12:00:00.000 2017.10.01T10:02:00.000 EURUSD  0.5429457
...