Which row (in order by some column) in a table corresponds to a row in another table?

Question

Which row (in order by some column) in a table corresponds to a row in another table?

159 Views Asked by msh210 At 13 January 2012 at 17:42

I have two tables, as follows (simplified from actual):

mysql> desc small_table;
+-----------------+---------------+------+-----+---------+-------+
| Field           | Type          | Null | Key | Default | Extra |
+-----------------+---------------+------+-----+---------+-------+
| event_time      | datetime      | NO   |     | NULL    |       |
| user_id         | char(15)      | NO   |     | NULL    |       |
| other_data      | int(11)       | NO   | MUL | NULL    |       |
+-----------------+---------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

mysql> desc large_table;
+-----------------+---------------+------+-----+---------+-------+
| Field           | Type          | Null | Key | Default | Extra |
+-----------------+---------------+------+-----+---------+-------+
| event_time      | datetime      | NO   |     | NULL    |       |
| user_id         | char(15)      | NO   |     | NULL    |       |
| other_data      | int(11)       | NO   |     | NULL    |       |
+-----------------+---------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

Now, small_table is, well, small: for each user_id there is usually only one row (though there are sometimes more). In large_table, on the other hand, each user_id appears numerous times.

mysql> select count(1) from small_table\G
*************************** 1. row ***************************
count(1): 20182
1 row in set (0.00 sec)


mysql> select count(1) from large_table\G
*************************** 1. row ***************************
count(1): 2870522
1 row in set (0.00 sec)

However, and this is important, for each row in small_table, there is at least one row in large_table with the same user_id, the same other_data, and similar event_time (the same within a few minutes, say).

I want to know whether small_table has a row corresponding to the first, or the second, or the whatever^th distinct row in large_table for the same user_id and similar event_time. That is, I'd like:

for each user_id, a count of distinct rows of large_table in order by event_time, but only for event_time within, say, three hours; that is, I seek only the count of such rows as have event_time within, say, three hours of one another; and
for each such collection of distinct rows, an identification of which row in that list (in order by event_time) has a corresponding row in small_table.

I don't seem able to write even a query that will do the first step, let alone a query that will do the second, and would appreciate any direction.

Original Q&A

There are 3 best solutions below

**johnny** · Answer 1 · 2012-01-13T17:55:16.327000

select count(s.user_id), s.event_time, s.other_data from small_table s
where s.user_id IN (select distinct user_id from big_table where event_time between 'StartDate' and 'EndDate')
order by s.event_time

I'm not sure what you're asking for in the small margin you mentioned.

also:

select * from large_table t1, large_table t2 
where t1.event_time <= date_sub(t2.event_time, INTERVAL 3 hour)

So, try:

  select count(s.user_id), s.event_time, s.other_data from small_table s
    where s.user_id IN ( select * from large_table t1, large_table t2 
    where t1.event_time <= date_sub(t2.event_time, INTERVAL 3 hour))
order by s.event_time

**Jonathan Leffler** · Answer 2 · 2012-01-15T07:04:57.770000

The necessary SQL for this is brutal; it will give your optimizer a really rather serious workout.

Judging from the comments after the question as well as the question, the desire is to treat sequences of events for a given user ID in the large table as 'contiguous' if they all fall within some fixed interval between adjacent events. For the sake of example, the fixed interval will be 3 hours. I'm coding for IBM Informix Dynamic Server (for sake of argument, version 11.70, but 11.50 would also work fine). This means that there is an idiosyncratic notation I need to explain. Specifically, the notation 3 UNITS HOUR denotes an interval of 3 hours; it could also be written INTERVAL(3) HOUR TO HOUR in the Informix dialect of SQL, or as INTERVAL '3' HOUR in standard SQL.

There are a couple of crucial techniques in generating SQL, especially complex SQL. One is to build the SQL up in steps, piecemeal, assembling the final result. The other is to ensure that you have a clear specification of what it is you are after.

In the notation that follows, the qualification 'for the same User_ID' should be taken as always being part of the expression.

In the large table, there are three categories of range that we want to consider before joining with the small table.

Entries where there is neither a row with an event time before the event that is close enough nor a row with an event time after the event that is close enough. This is a time range with the same start and end time.
A pair of entries in the table that are themselves close enough but for which there is neither an event earlier than the early event of the pair that is close enough nor an event later than the late event of the pair that is close enough nor an event in between the pair.
A sequence of three or more entries in the table for which there is:
- No event E1 earlier than the earliest that is close enough
- No event E2 later than the latest that is close enough
- An event E3 later than the earliest that is close enough to the earliest
- An event E4 earlier than the latest that is close enough to the latest (E4 might possibly be the same event as E3)
- No pair of events E5, E6 between the earliest and the latest where there is no event in between E5 and E6 but the gap between E5 and E6 is too large to count.

As you can see from the description, this is going to be some scary SQL!

NB: The code has now been tested; some (mainly small) changes were necessary. One minor unnecessary change was the addition of ORDER BY clauses on intermediate queries. Another unnecessary change was to select the other data from the small table for verification purposes. This revision was made without studying the amended version posted by msh210.

Also note that I'm far from certain this is a minimal formulation; it may be feasible to classify all the ranges with a single SELECT statement instead of a UNION of three SELECT statements (and it would be good if that is the case).

Singleton ranges

-- Ranges of exactly 1 event
SELECT lt1.user_id, lt1.event_time AS min_time, lt1.event_time AS max_time
  FROM Large_Table AS lt1
 WHERE NOT EXISTS -- an earlier event that is close enough
       (SELECT *
          FROM Large_Table AS lt3
         WHERE lt1.user_id = lt3.user_id
           AND lt3.event_time > lt1.event_time - 3 UNITS HOUR
           AND lt3.event_time < lt1.event_time
       )
   AND NOT EXISTS -- a later event that is close enough
       (SELECT *
          FROM Large_Table AS lt4
         WHERE lt1.user_id = lt4.user_id
           AND lt4.event_time > lt1.event_time
           AND lt4.event_time < lt1.event_time + 3 UNITS HOUR
       )
ORDER BY User_ID, Min_Time;

Doubleton Ranges

-- Ranges of exactly 2 events
SELECT lt1.user_id, lt1.event_time AS min_time, lt2.event_time AS max_time
  FROM Large_Table AS lt1
  JOIN Large_Table AS lt2
    ON lt1.user_id = lt2.user_id
   AND lt1.event_time < lt2.event_time
   AND lt2.event_time < lt1.event_time + 3 UNITS HOUR
 WHERE NOT EXISTS -- an earlier event that is close enough
       (SELECT *
          FROM Large_Table AS lt3
         WHERE lt1.user_id = lt3.user_id
           AND lt3.event_time > lt1.event_time - 3 UNITS HOUR
           AND lt3.event_time < lt1.event_time
       )
   AND NOT EXISTS -- a later event that is close enough
       (SELECT *
          FROM Large_Table AS lt4
         WHERE lt1.user_id = lt4.user_id
           AND lt4.event_time > lt2.event_time
           AND lt4.event_time < lt2.event_time + 3 UNITS HOUR
       )
   AND NOT EXISTS -- any event in between
       (SELECT *
          FROM Large_Table AS lt5
         WHERE lt1.user_id = lt5.user_id
           AND lt5.event_time > lt1.event_time
           AND lt5.event_time < lt2.event_time
       )
ORDER BY User_ID, Min_Time;

Added 3 hour criterion to outer WHERE clause.

Multiple Event Ranges

-- Ranges of 3 or more events
SELECT lt1.user_id, lt1.event_time AS min_time, lt2.event_time AS max_time
  FROM Large_Table AS lt1
  JOIN Large_Table AS lt2
    ON lt1.user_id = lt2.user_id
   AND lt1.event_time < lt2.event_time
 WHERE NOT EXISTS -- an earlier event that is close enough
       (SELECT *
          FROM Large_Table AS lt3
         WHERE lt1.user_id = lt3.user_id
           AND lt3.event_time > lt1.event_time - 3 UNITS HOUR
           AND lt3.event_time < lt1.event_time
       )
   AND NOT EXISTS -- a later event that is close enough
       (SELECT *
          FROM Large_Table AS lt4
         WHERE lt1.user_id = lt4.user_id
           AND lt4.event_time > lt2.event_time
           AND lt4.event_time < lt2.event_time + 3 UNITS HOUR
       )
   AND NOT EXISTS -- a gap that's too big in the events between first and last
       (SELECT *
          FROM Large_Table AS lt5 -- E5 before E6
          JOIN Large_Table AS lt6
            ON lt5.user_id = lt6.user_id
           AND lt5.event_time < lt6.event_time
         WHERE lt1.user_id = lt5.user_id
           AND lt6.event_time < lt2.event_time
           AND lt5.event_time > lt1.event_time
           AND (lt6.event_time - lt5.event_time) > 3 UNITS HOUR
           AND NOT EXISTS -- an event in between these two
               (SELECT *
                  FROM Large_Table AS lt9
                 WHERE lt5.user_id = lt9.user_id
                   AND lt9.event_time > lt5.event_time
                   AND lt9.event_time < lt6.event_time
               )
       )
   AND EXISTS -- an event close enough after the start
       (SELECT *
          FROM Large_Table AS lt7
         WHERE lt1.user_id = lt7.user_id
           AND lt1.event_time < lt7.event_time
           AND lt7.event_time < lt1.event_time + 3 UNITS HOUR
           AND lt7.event_time < lt2.event_time
       )
   AND EXISTS -- an event close enough before the end
       (SELECT *
          FROM Large_Table AS lt8
         WHERE lt2.user_id = lt8.user_id
           AND lt2.event_time > lt8.event_time
           AND lt8.event_time > lt2.event_time - 3 UNITS HOUR
           AND lt8.event_time > lt1.event_time
       )
ORDER BY User_ID, Min_Time;

Added omitted nested NOT EXISTS clause in the 'big gaps' sub-query.

All Ranges in Large Table

Clearly, the complete list of ranges in the last table are the union of the three queries above.

Query deleted as not interesting enough. It is simply the 3-way UNION of the separate queries above.

Final Query

The final query finds the ranges, if any, in the result of the gruesome 3-part UNION which is close enough to the entry in the small table. A single entry in the small table might fall at, say, 13:00, and there might be a range in the large table that ends at 11:00 and another that starts at 15:00. The two ranges from the large table are separate (the gap between them is 4 hours), but the entry in the small table is close enough to both to count. [The tests cover this case.]

SELECT S.User_id, S.Event_Time, L.Min_Time, L.Max_Time, S.Other_Data
  FROM Small_Table AS S
  JOIN (
        -- Ranges of exactly 1 event
        SELECT lt1.user_id, lt1.event_time AS min_time, lt1.event_time AS max_time
          FROM Large_Table AS lt1
         WHERE NOT EXISTS -- an earlier event that is close enough
               (SELECT *
                  FROM Large_Table AS lt3
                 WHERE lt1.user_id = lt3.user_id
                   AND lt3.event_time > lt1.event_time - 3 UNITS HOUR
                   AND lt3.event_time < lt1.event_time
               )
           AND NOT EXISTS -- a later event that is close enough
               (SELECT *
                  FROM Large_Table AS lt4
                 WHERE lt1.user_id = lt4.user_id
                   AND lt4.event_time > lt1.event_time
                   AND lt4.event_time < lt1.event_time + 3 UNITS HOUR
               )
        UNION
        -- Ranges of exactly 2 events
        SELECT lt1.user_id, lt1.event_time AS min_time, lt2.event_time AS max_time
          FROM Large_Table AS lt1
          JOIN Large_Table AS lt2
            ON lt1.user_id = lt2.user_id
           AND lt1.event_time < lt2.event_time
           AND lt2.event_time < lt1.event_time + 3 UNITS HOUR
         WHERE NOT EXISTS -- an earlier event that is close enough
               (SELECT *
                  FROM Large_Table AS lt3
                 WHERE lt1.user_id = lt3.user_id
                   AND lt3.event_time > lt1.event_time - 3 UNITS HOUR
                   AND lt3.event_time < lt1.event_time
               )
           AND NOT EXISTS -- a later event that is close enough
               (SELECT *
                  FROM Large_Table AS lt4
                 WHERE lt1.user_id = lt4.user_id
                   AND lt4.event_time > lt2.event_time
                   AND lt4.event_time < lt2.event_time + 3 UNITS HOUR
               )
           AND NOT EXISTS -- any event in between
               (SELECT *
                  FROM Large_Table AS lt5
                 WHERE lt1.user_id = lt5.user_id
                   AND lt5.event_time > lt1.event_time
                   AND lt5.event_time < lt2.event_time
               )
        UNION
        -- Ranges of 3 or more events
        SELECT lt1.user_id, lt1.event_time AS min_time, lt2.event_time AS max_time
          FROM Large_Table AS lt1
          JOIN Large_Table AS lt2
            ON lt1.user_id = lt2.user_id
           AND lt1.event_time < lt2.event_time
         WHERE NOT EXISTS -- an earlier event that is close enough
               (SELECT *
                  FROM Large_Table AS lt3
                 WHERE lt1.user_id = lt3.user_id
                   AND lt3.event_time > lt1.event_time - 3 UNITS HOUR
                   AND lt3.event_time < lt1.event_time
               )
           AND NOT EXISTS -- a later event that is close enough
               (SELECT *
                  FROM Large_Table AS lt4
                 WHERE lt1.user_id = lt4.user_id
                   AND lt4.event_time > lt2.event_time
                   AND lt4.event_time < lt2.event_time + 3 UNITS HOUR
               )
           AND NOT EXISTS -- a gap that's too big in the events between first and last
               (SELECT *
                  FROM Large_Table AS lt5 -- E5 before E6
                  JOIN Large_Table AS lt6
                    ON lt5.user_id = lt6.user_id
                   AND lt5.event_time < lt6.event_time
                 WHERE lt1.user_id = lt5.user_id
                   AND lt6.event_time < lt2.event_time
                   AND lt5.event_time > lt1.event_time
                   AND (lt6.event_time - lt5.event_time) > 3 UNITS HOUR
                   AND NOT EXISTS -- an event in between these two
                       (SELECT *
                          FROM Large_Table AS lt9
                         WHERE lt5.user_id = lt9.user_id
                           AND lt9.event_time > lt5.event_time
                           AND lt9.event_time < lt6.event_time
                       )
               )
           AND EXISTS -- an event close enough after the start
               (SELECT *
                  FROM Large_Table AS lt7
                 WHERE lt1.user_id = lt7.user_id
                   AND lt1.event_time < lt7.event_time
                   AND lt7.event_time < lt1.event_time + 3 UNITS HOUR
                   AND lt7.event_time < lt2.event_time
               )
           AND EXISTS -- an event close enough before the end
               (SELECT *
                  FROM Large_Table AS lt8
                 WHERE lt2.user_id = lt8.user_id
                   AND lt2.event_time > lt8.event_time
                   AND lt8.event_time > lt2.event_time - 3 UNITS HOUR
                   AND lt8.event_time > lt1.event_time
               )
       ) AS L
    ON S.User_ID = L.User_ID
 WHERE S.Event_Time > L.Min_Time - 3 UNITS HOUR
   AND S.Event_Time < L.Max_Time + 3 UNITS HOUR
ORDER BY User_ID, Event_Time, Min_Time;

~~OK - fair warning; the SQL has not actually been anywhere near an SQL DBMS.~~

The code has now been tested. The infinitesimal chance was actually zero; there was a syntax error and a couple of more or less minor problems to resolve.

I experimented in stages after devising the test data. I used the 'Alpha' data (see below) while validating and fixing the queries, and added the Beta data only to ensure that there was no cross-talk between different User_ID values.

I used explicit < and > operations rather than BETWEEN ... AND to exclude the end points; if you want events exactly 3 hours apart to count as 'close enough', then you need to review each inequality, possibly changing them to BETWEEN ... AND or possibly just using >= or <= as appropriate.

There's an answer to a loosely similar but rather simpler question that (a) I wrote and (b) provided some helpful thoughts on the complex processing above (in particular, the 'no event earlier but close enough' and 'no event later but close enough' criteria. The 'close enough' criteria most definitely complicate this question.

Test Data

Large Table

CREATE TABLE Large_Table
(
    Event_Time  DATETIME YEAR TO MINUTE NOT NULL,
    User_ID     CHAR(15) NOT NULL,
    Other_Data  INTEGER NOT NULL,
    PRIMARY KEY(User_ID, Event_Time)
);

INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-01 09:15', 'Alpha',  1) { R4 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-01 11:15', 'Alpha',  2) { R4 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-01 13:15', 'Alpha',  3) { R4 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-01 15:15', 'Alpha',  4) { R4 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-01 12:17', 'Beta',   1) { R4 };

INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-02 09:15', 'Alpha',  5) { R1 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-02 10:17', 'Beta',   2) { R1 };

INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-03 09:15', 'Alpha',  6) { R2 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-03 11:15', 'Alpha',  7) { R2 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-03 10:17', 'Beta',   3) { R1 };

INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-04 09:15', 'Alpha',  8) { R3 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-04 11:15', 'Alpha',  9) { R3 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-04 13:15', 'Alpha', 10) { R3 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-04 10:17', 'Beta',   4) { R1 };

INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-05 09:15', 'Alpha', 11) { R2 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-05 11:15', 'Alpha', 12) { R2 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-05 10:17', 'Beta',   5) { R1 };
{ Probe here }
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-05 15:15', 'Alpha', 13) { R2 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-05 17:15', 'Alpha', 14) { R2 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-05 16:17', 'Beta',   6) { R1 };

INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-06 09:15', 'Alpha', 15) { R6 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-06 11:15', 'Alpha', 16) { R6 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-06 13:15', 'Alpha', 17) { R6 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-06 15:15', 'Alpha', 18) { R6 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-06 17:15', 'Alpha', 19) { R6 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-06 19:15', 'Alpha', 20) { R6 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-06 16:17', 'Beta',   7) { R1 };

INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-07 09:15', 'Alpha', 21) { R1 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-07 11:17', 'Beta',   8) { R1 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-07 12:15', 'Alpha', 22) { R1 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-07 13:17', 'Beta',   9) { R1 };

INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-08 09:15', 'Alpha', 23) { R5 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-08 11:15', 'Alpha', 24) { R5 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-08 13:15', 'Alpha', 25) { R5 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-08 15:15', 'Alpha', 26) { R5 };
INSERT INTO Large_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-08 17:15', 'Alpha', 27) { R5 };

Small Table

Note: for the purposes of testing, the small table actually contains more rows than the large table. The rows in the small table with Other_Data values larger than 100 should not appear in the results (and don't). The tests here do poke at the edge conditions.

CREATE TABLE Small_Table
(
    Event_Time  DATETIME YEAR TO MINUTE NOT NULL,
    User_ID     CHAR(15) NOT NULL,
    Other_Data  INTEGER NOT NULL,
    PRIMARY KEY(User_ID, Event_Time)
);

INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-01 06:15', 'Alpha', 131) { XX };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-01 06:20', 'Alpha',  31) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-01 10:20', 'Alpha',  32) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-01 13:20', 'Alpha',  33) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-01 15:20', 'Alpha',  34) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-01 18:15', 'Alpha', 134) { XX };

INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-02 06:15', 'Alpha', 135) { XX };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-02 06:16', 'Alpha',  35) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-02 10:20', 'Alpha',  35) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-02 12:14', 'Alpha',  35) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-02 12:15', 'Alpha', 135) { XX };

INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-03 09:20', 'Alpha',  36) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-03 11:20', 'Alpha',  37) { YY };

INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-04 09:20', 'Alpha',  38) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-04 11:20', 'Alpha',  39) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-04 13:20', 'Alpha',  40) { YY };

INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-05 09:20', 'Alpha',  41) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-05 11:20', 'Alpha',  42) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-05 13:20', 'Alpha',  42) { 22 };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-05 15:20', 'Alpha',  43) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-05 17:20', 'Alpha',  44) { YY };

INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-06 09:20', 'Alpha',  45) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-06 11:20', 'Alpha',  46) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-06 13:20', 'Alpha',  47) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-06 15:20', 'Alpha',  48) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-06 17:20', 'Alpha',  49) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-06 19:20', 'Alpha',  50) { YY };

INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-07 09:20', 'Alpha',  51) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-07 10:20', 'Alpha',  51) { 22 };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-07 12:20', 'Alpha',  52) { YY };

INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-08 09:20', 'Alpha',  53) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-08 11:20', 'Alpha',  54) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-08 13:20', 'Alpha',  55) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-08 15:20', 'Alpha',  56) { YY };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-08 17:20', 'Alpha',  57) { YY };


INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-07 13:27', 'Beta',   9) { R1 };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-07 11:27', 'Beta',   8) { R1 };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-06 16:27', 'Beta',   7) { R1 };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-05 16:27', 'Beta',   6) { R1 };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-05 10:27', 'Beta',   5) { R1 };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-04 10:27', 'Beta',   4) { R1 };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-03 10:27', 'Beta',   3) { R1 };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-02 10:27', 'Beta',   2) { R1 };
INSERT INTO Small_Table(Event_Time, User_ID, Other_Data) VALUES('2012-01-01 12:27', 'Beta',   1) { R4 };

Final Query Results

Using the data above, the results obtained were:

Alpha   2012-01-01 06:20   2012-01-01 09:15   2012-01-01 15:15   31
Alpha   2012-01-01 10:20   2012-01-01 09:15   2012-01-01 15:15   32
Alpha   2012-01-01 13:20   2012-01-01 09:15   2012-01-01 15:15   33
Alpha   2012-01-01 15:20   2012-01-01 09:15   2012-01-01 15:15   34
Alpha   2012-01-02 06:16   2012-01-02 09:15   2012-01-02 09:15   35
Alpha   2012-01-02 10:20   2012-01-02 09:15   2012-01-02 09:15   35
Alpha   2012-01-02 12:14   2012-01-02 09:15   2012-01-02 09:15   35
Alpha   2012-01-03 09:20   2012-01-03 09:15   2012-01-03 11:15   36
Alpha   2012-01-03 11:20   2012-01-03 09:15   2012-01-03 11:15   37
Alpha   2012-01-04 09:20   2012-01-04 09:15   2012-01-04 13:15   38
Alpha   2012-01-04 11:20   2012-01-04 09:15   2012-01-04 13:15   39
Alpha   2012-01-04 13:20   2012-01-04 09:15   2012-01-04 13:15   40
Alpha   2012-01-05 09:20   2012-01-05 09:15   2012-01-05 11:15   41
Alpha   2012-01-05 11:20   2012-01-05 09:15   2012-01-05 11:15   42
Alpha   2012-01-05 13:20   2012-01-05 09:15   2012-01-05 11:15   42
Alpha   2012-01-05 13:20   2012-01-05 15:15   2012-01-05 17:15   42
Alpha   2012-01-05 15:20   2012-01-05 15:15   2012-01-05 17:15   43
Alpha   2012-01-05 17:20   2012-01-05 15:15   2012-01-05 17:15   44
Alpha   2012-01-06 09:20   2012-01-06 09:15   2012-01-06 19:15   45
Alpha   2012-01-06 11:20   2012-01-06 09:15   2012-01-06 19:15   46
Alpha   2012-01-06 13:20   2012-01-06 09:15   2012-01-06 19:15   47
Alpha   2012-01-06 15:20   2012-01-06 09:15   2012-01-06 19:15   48
Alpha   2012-01-06 17:20   2012-01-06 09:15   2012-01-06 19:15   49
Alpha   2012-01-06 19:20   2012-01-06 09:15   2012-01-06 19:15   50
Alpha   2012-01-07 09:20   2012-01-07 09:15   2012-01-07 09:15   51
Alpha   2012-01-07 09:20   2012-01-07 12:15   2012-01-07 12:15   51
Alpha   2012-01-07 10:20   2012-01-07 09:15   2012-01-07 09:15   51
Alpha   2012-01-07 10:20   2012-01-07 12:15   2012-01-07 12:15   51
Alpha   2012-01-07 12:20   2012-01-07 12:15   2012-01-07 12:15   52
Alpha   2012-01-08 09:20   2012-01-08 09:15   2012-01-08 17:15   53
Alpha   2012-01-08 11:20   2012-01-08 09:15   2012-01-08 17:15   54
Alpha   2012-01-08 13:20   2012-01-08 09:15   2012-01-08 17:15   55
Alpha   2012-01-08 15:20   2012-01-08 09:15   2012-01-08 17:15   56
Alpha   2012-01-08 17:20   2012-01-08 09:15   2012-01-08 17:15   57
Beta    2012-01-01 12:27   2012-01-01 12:17   2012-01-01 12:17    1
Beta    2012-01-02 10:27   2012-01-02 10:17   2012-01-02 10:17    2
Beta    2012-01-03 10:27   2012-01-03 10:17   2012-01-03 10:17    3
Beta    2012-01-04 10:27   2012-01-04 10:17   2012-01-04 10:17    4
Beta    2012-01-05 10:27   2012-01-05 10:17   2012-01-05 10:17    5
Beta    2012-01-05 16:27   2012-01-05 16:17   2012-01-05 16:17    6
Beta    2012-01-06 16:27   2012-01-06 16:17   2012-01-06 16:17    7
Beta    2012-01-07 11:27   2012-01-07 11:17   2012-01-07 13:17    8
Beta    2012-01-07 13:27   2012-01-07 11:17   2012-01-07 13:17    9

Intermediate results

Slightly different formatting in effect.

Singleton ranges

Alpha|2012-01-02 09:15|2012-01-02 09:15
Alpha|2012-01-07 09:15|2012-01-07 09:15
Alpha|2012-01-07 12:15|2012-01-07 12:15
Beta|2012-01-01 12:17|2012-01-01 12:17
Beta|2012-01-02 10:17|2012-01-02 10:17
Beta|2012-01-03 10:17|2012-01-03 10:17
Beta|2012-01-04 10:17|2012-01-04 10:17
Beta|2012-01-05 10:17|2012-01-05 10:17
Beta|2012-01-05 16:17|2012-01-05 16:17
Beta|2012-01-06 16:17|2012-01-06 16:17

Doubleton Ranges

Alpha|2012-01-03 09:15|2012-01-03 11:15
Alpha|2012-01-05 09:15|2012-01-05 11:15
Alpha|2012-01-05 15:15|2012-01-05 17:15
Beta|2012-01-07 11:17|2012-01-07 13:17

Multiple Event Ranges

Alpha|2012-01-01 09:15|2012-01-01 15:15
Alpha|2012-01-04 09:15|2012-01-04 13:15
Alpha|2012-01-06 09:15|2012-01-06 19:15
Alpha|2012-01-08 09:15|2012-01-08 17:15

**msh210** · Answer 3 · 2012-01-15T18:23:43.943000

This should perhaps be a comment on Jonathan Leffler's detailed and helpful answer but (a) it's too long and (b) it does help answer my question, so I'm posting it as an answer.

The code titled "Multiple Event Ranges" in Jonathan Leffler's answer finds ranges where a second instance is soon after the first, and a penultimate instance is soon before the last, and no big breaks appear, but bars any big gap between interior instances, even if there are otherinstances between them. So, for example, if the limit is 3 hours, instances at 1, 2, 4, 6, and 7 would be barred because of the gap between 2 and 6. I think the correct code would instead be (building directly on Jonathan Leffler's):

SELECT lt1.user_id, lt1.event_time AS min_time, lt2.event_time AS max_time
  FROM Large_Table AS lt1
  JOIN Large_Table AS lt2
    ON lt1.user_id = lt2.user_id
   AND lt1.event_time < lt2.event_time
 WHERE NOT EXISTS -- an earlier event that is close enough
       (SELECT *
          FROM Large_Table AS lt3
         WHERE lt1.user_id = lt3.user_id
           AND lt3.event_time > lt1.event_time - 3 UNITS HOUR
           AND lt3.event_time < lt1.event_time
       )
   AND NOT EXISTS -- a later event that is close enough
       (SELECT *
          FROM Large_Table AS lt4
         WHERE lt1.user_id = lt4.user_id
           AND lt4.event_time > lt2.event_time
           AND lt4.event_time < lt2.event_time + 3 UNITS HOUR
       )
   AND NOT EXISTS -- a gap that's too big in the events between first and last
       (SELECT *
          FROM Large_Table AS lt5 -- E5 before E6
          JOIN Large_Table AS lt6
            ON lt5.user_id = lt6.user_id
           AND lt1.user_id = lt5.user_id
           AND lt5.event_time < lt6.event_time
           AND lt6.event_time <= lt2.event_time
           AND lt5.event_time >= lt1.event_time
           AND (lt6.event_time - lt5.event_time) > 3 UNITS HOUR
           and not exists (
             select * from large_table as lt9 
               where lt9.event_time > lt5.event_time
                 and lt6.event_time > lt9.event_time
             )
       )

which obviates the need for the last two and existss in the code titled "Multiple Event Ranges" in Jonathan Leffler's answer and, indeed, obviates the need for the "Singleton ranges" and "Doubleton ranges" code in his answer.

Unless I'm missing something.

Which row (in order by some column) in a table corresponds to a row in another table?

There are 3 best solutions below

Singleton ranges

Doubleton Ranges

Multiple Event Ranges

All Ranges in Large Table

Final Query

Test Data

Large Table

Small Table

Final Query Results

Intermediate results

Related Questions in MYSQL

Related Questions in SQL-ORDER-BY

Related Questions in MYSQL-5.1

Trending Questions

Popular # Hahtags

Popular Questions