I have a large table that contains a date
field of type datetime
. As part of a function that takes as input two lists of datetime
namely a list of afroms
and atos
I'd like to compute for each of these afrom,ato
pairs all the rows of the large table whose date is between them.
I worked out a not very efficient way to do this i.e. it has a serious scalability drawback:
/ t1 contains my afrom,ato pairs
q)t1:([] afrom:`datetime$(2017.10.01T10:00:00.000 2017.10.02T10:00:00.000);ato:`datetime$(2017.10.01T12:00:00.000 2017.10.02T12:00:00.000));
q)t1
afrom ato
-----------------------------------------------
2017.10.01T10:00:00.000 2017.10.01T12:00:00.000
2017.10.02T10:00:00.000 2017.10.02T12:00:00.000
/ t2 contains my very very large dataset
q)t2:([] date:`datetime$(2017.10.01T10:01:00.000 2017.10.01T10:02:00.000 2017.10.01T10:03:00.000 2017.10.02T10:01:00.000 2017.10.02T10:02:00.000 2017.10.02T10:03:00.000); ccypair:(3#`EURUSD),(3#`USDCHF); mid:6?1.05);
q)t2
date ccypair mid
----------------------------------------------------
2017.10.01T10:01:00.000 EURUSD 0.24256133290473372
2017.10.01T10:02:00.000 EURUSD 0.091602176288142809
2017.10.01T10:03:00.000 EURUSD 0.10756538207642735
2017.10.02T10:01:00.000 USDCHF 0.91046513157198206
2017.10.02T10:02:00.000 USDCHF 0.76424539103172717
2017.10.02T10:03:00.000 USDCHF 0.17090452200500295
Then I can use cross
like this:
select from (t1 cross t2) where afrom<date,date<ato
and this produces the correct results:
afrom ato date ccypa..
-----------------------------------------------------------------------------..
2017.10.01T10:00:00.000 2017.10.01T12:00:00.000 2017.10.01T10:01:00.000 EURUS..
2017.10.01T10:00:00.000 2017.10.01T12:00:00.000 2017.10.01T10:02:00.000 EURUS..
2017.10.01T10:00:00.000 2017.10.01T12:00:00.000 2017.10.01T10:03:00.000 EURUS..
2017.10.02T10:00:00.000 2017.10.02T12:00:00.000 2017.10.02T10:01:00.000 USDCH..
2017.10.02T10:00:00.000 2017.10.02T12:00:00.000 2017.10.02T10:02:00.000 USDCH..
2017.10.02T10:00:00.000 2017.10.02T12:00:00.000 2017.10.02T10:03:00.000 USDCH..
However, when I have a large list of afroms
and atos
the cross
will "unnecessarily" expand the potentially large table t2
times the size of t1
and that doesn't scale well.
Is there a better way to do this? e.g. I have tried things like:
select from t2 where (exec afrom from t1)<date,date<(exec ato from t1)
error: `length
I would need to do a loop I guess but not sure how. Subquestion .. is it possible to have a sigle list of interval tuples i.e. intervals(afrom;ato)
instead of having separated afroms
and atos
?
If I understand this correctly then one way to do this would be use each row of
t1
and find where any row fallswithin
each time range:Which is the same as yoru example output above, without the
afrom
andato
columns. In this example within selects all values within a range andany
and each-right/:
allow you to use multiple ranges. If this does not scale particularly well for you then you can work on each row oft1
individually:Which should work if the windows do not overlap.
If you need exclusive or then you could try modifying the window times slightly to ensure they are excluded:
To add the
afrom
andato
columns to the output you can cross them with the selected rows: