I have two tables, as follows (simplified from actual):
mysql> desc small_table; +-----------------+---------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------------+---------------+------+-----+---------+-------+ | event_time | datetime | NO | | NULL | | | user_id | char(15) | NO | | NULL | | | other_data | int(11) | NO | MUL | NULL | | +-----------------+---------------+------+-----+---------+-------+ 3 rows in set (0.00 sec) mysql> desc large_table; +-----------------+---------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------------+---------------+------+-----+---------+-------+ | event_time | datetime | NO | | NULL | | | user_id | char(15) | NO | | NULL | | | other_data | int(11) | NO | | NULL | | +-----------------+---------------+------+-----+---------+-------+ 3 rows in set (0.00 sec)
Now, small_table is, well, small: for each user_id there is usually only one row (though there are sometimes more). In large_table, on the other hand, each user_id appears numerous times.
mysql> select count(1) from small_table\G *************************** 1. row *************************** count(1): 20182 1 row in set (0.00 sec) mysql> select count(1) from large_table\G *************************** 1. row *************************** count(1): 2870522 1 row in set (0.00 sec)
However, and this is important, for each row in small_table, there is at least one row in large_table with the same user_id, the same other_data, and similar event_time (the same within a few minutes, say).
I want to know whether small_table has a row corresponding to the first, or the second, or the whateverth distinct row in large_table for the same user_id and similar event_time. That is, I'd like:
- for each
user_id, a count of distinct rows oflarge_tablein order byevent_time, but only forevent_timewithin, say, three hours; that is, I seek only the count of such rows as haveevent_timewithin, say, three hours of one another; and - for each such collection of distinct rows, an identification of which row in that list (in order by
event_time) has a corresponding row insmall_table.
I don't seem able to write even a query that will do the first step, let alone a query that will do the second, and would appreciate any direction.
I'm not sure what you're asking for in the small margin you mentioned.
also:
So, try: