I have two tables A and B. I want to join them based on their validity time intervals.
A has product quality (irregular times) and B has hourly settings during the production period. I need to create a table like C that includes the parameters p1 and p2 for all A's RefDates that fall in the time range of B's ValidFrom ValidTo.
A
RefDate result
'11-Oct-2017 00:14:00' 17
'11-Oct-2017 00:14:00' 19
'11-Oct-2017 00:20:00' 5
'11-Oct-2017 01:30:00' 25
'11-Oct-2017 01:30:00' 18
'11-Oct-2017 03:03:00' 28
B
ValidFrom ValidTo p1 p2
'11-Oct-2017 00:13:00' '11-Oct-2017 01:12:59' 2 1
'11-Oct-2017 01:13:00' '11-Oct-2017 02:12:59' 3 1
'11-Oct-2017 02:13:00' '11-Oct-2017 03:12:59' 4 5
'11-Oct-2017 03:13:00' '11-Oct-2017 04:12:59' 6 1
'11-Oct-2017 04:13:00' '11-Oct-2017 05:12:59' 7 9
I need to get something like this.
C
RefDate res p1 p2
'11-Oct-2017 00:14:00' 17 2 1
'11-Oct-2017 00:14:00' 19 2 1
'11-Oct-2017 00:20:00' 5 2 1
'11-Oct-2017 01:30:00' 25 3 1
'11-Oct-2017 01:30:00' 18 3 1
'11-Oct-2017 03:03:00' 28 4 5
I know how to do this in SQL and I think I have figured out how to do this row by row in MatLab but this is horribly slow. The data set is rather large. I just assume there must be a more elegant way that I just couldn't find.
Something that caused many of my approaches to fail is that the RefDate column is not unique.
edit: the real tables have thousands of rows and hundreds of variables.
C (in reality)
RefDate res res2 ... res200 p1 p2 ... p1000
11-Oct-2017 00:14:00 17 2 1
11-Oct-2017 00:14:00 19 2 1
11-Oct-2017 00:20:00 5 2 1
11-Oct-2017 01:30:00 25 3 1
11-Oct-2017 01:30:00 18 3 1
11-Oct-2017 03:03:00 28 4 5
This can actually be done in a single line of code. Assuming your
ValidTo
value always ends immediately before theValidFrom
in the next row (which it does in your example), you only need to use yourValidFrom
values. First, convert those and yourRefDate
values to serial date numbers usingdatenum
. Then use thediscretize
function to bin theRefDate
values using theValidFrom
values as the edges, which will give you the row index inB
that contains each time inA
. Then use that index to extract thep1
andp2
values and append them toA
:The above solution should work for any number of columns
pN
inB
.If there are any times in
A
that don't fall in any of the ranges inB
, you will have to break the solution into multiple lines so you can check whether or not the index returned fromdiscretize
containsNaN
values. Assuming you want to exclude those rows fromC
, this would be the new solution: