How to join tables in Matlab (2018) by matching time intervals?

159 Views Asked by At

I have two tables A and B. I want to join them based on their validity time intervals.

A has product quality (irregular times) and B has hourly settings during the production period. I need to create a table like C that includes the parameters p1 and p2 for all A's RefDates that fall in the time range of B's ValidFrom ValidTo.

A
RefDate                 result
'11-Oct-2017 00:14:00'  17
'11-Oct-2017 00:14:00'  19
'11-Oct-2017 00:20:00'  5
'11-Oct-2017 01:30:00'  25
'11-Oct-2017 01:30:00'  18
'11-Oct-2017 03:03:00'  28


B
ValidFrom               ValidTo                 p1  p2
'11-Oct-2017 00:13:00'  '11-Oct-2017 01:12:59'  2   1
'11-Oct-2017 01:13:00'  '11-Oct-2017 02:12:59'  3   1
'11-Oct-2017 02:13:00'  '11-Oct-2017 03:12:59'  4   5
'11-Oct-2017 03:13:00'  '11-Oct-2017 04:12:59'  6   1
'11-Oct-2017 04:13:00'  '11-Oct-2017 05:12:59'  7   9

I need to get something like this.

C
RefDate                 res p1  p2
'11-Oct-2017 00:14:00'  17  2   1
'11-Oct-2017 00:14:00'  19  2   1
'11-Oct-2017 00:20:00'  5   2   1
'11-Oct-2017 01:30:00'  25  3   1
'11-Oct-2017 01:30:00'  18  3   1
'11-Oct-2017 03:03:00'  28  4   5

I know how to do this in SQL and I think I have figured out how to do this row by row in MatLab but this is horribly slow. The data set is rather large. I just assume there must be a more elegant way that I just couldn't find.

Something that caused many of my approaches to fail is that the RefDate column is not unique.

edit: the real tables have thousands of rows and hundreds of variables.

C (in reality)
RefDate                 res res2 ... res200 p1  p2 ... p1000
11-Oct-2017 00:14:00    17                  2   1
11-Oct-2017 00:14:00    19                  2   1
11-Oct-2017 00:20:00    5                   2   1
11-Oct-2017 01:30:00    25                  3   1
11-Oct-2017 01:30:00    18                  3   1
11-Oct-2017 03:03:00    28                  4   5
2

There are 2 best solutions below

2
On BEST ANSWER

This can actually be done in a single line of code. Assuming your ValidTo value always ends immediately before the ValidFrom in the next row (which it does in your example), you only need to use your ValidFrom values. First, convert those and your RefDate values to serial date numbers using datenum. Then use the discretize function to bin the RefDate values using the ValidFrom values as the edges, which will give you the row index in B that contains each time in A. Then use that index to extract the p1 and p2 values and append them to A:

>> C = [A B(discretize(datenum(A.RefDate), datenum(B.ValidFrom)), 3:end)]

C = 

           RefDate            result    p1    p2
    ______________________    ______    __    __

    '11-Oct-2017 00:14:00'    17        2     1 
    '11-Oct-2017 00:14:00'    19        2     1 
    '11-Oct-2017 00:20:00'     5        2     1 
    '11-Oct-2017 01:30:00'    25        3     1 
    '11-Oct-2017 01:30:00'    18        3     1 
    '11-Oct-2017 03:03:00'    28        4     5 

The above solution should work for any number of columns pN in B.

If there are any times in A that don't fall in any of the ranges in B, you will have to break the solution into multiple lines so you can check whether or not the index returned from discretize contains NaN values. Assuming you want to exclude those rows from C, this would be the new solution:

index = discretize(datenum(A.RefDate), datenum(B.ValidFrom));
C = [A(~isnan(index), :) B(index(~isnan(index)), 3:end)];
4
On

The following code does exactly what you are asking for:

% convert to datetime
A.RefDate = datetime(A.RefDate);
B.ValidFrom = datetime(B.ValidFrom);
B.ValidTo = datetime(B.ValidTo);

% for each row in A, find the matching row in B
i = cellfun(@find, arrayfun(@(x) (x >= B.ValidFrom) & (x <= B.ValidTo), A.RefDate, 'UniformOutput', false), 'UniformOutput', false);

% find rows in A that where not matched
j = cellfun(@isempty, i, 'UniformOutput', false);

% build the result
C = [B(cell2mat(i),:) A(~cell2mat(j),:)];

% display output
C