I am auditing user details from my application using open Id login .If a first time a user is login a OPEN ID we consider as signup . I am generating audit signin report using this details . Sample Table Data.
+---------+----------+-----------+---------------+
| USER_ID | PROVIDER | OPERATION | TIMESTAMP |
+---------+----------+-----------+---------------+
| 120 | Google | SIGN_UP | 1347296347000 |
| 120 | Google | SIGN_IN | 1347296347000 |
| 121 | Yahoo | SIGN_IN | 1347296347000 |
| 122 | Yahoo | SIGN_IN | 1347296347000 |
| 120 | Google | SIGN_UP | 1347296347000 |
| 120 | FaceBook | SIGN_IN | 1347296347000 |
+---------+----------+-----------+---------------+
In this table I want to exclude already SIGN_UP ed "SIGN_IN" ed user count based on provider .
Show Create table
CREATE TABLE `signin_details` (
`USER_ID` int(11) DEFAULT NULL,
`PROVIDER` char(40) DEFAULT NULL,
`OPERATION` char(40) DEFAULT NULL,
`TIMESTAMP` bigint(20) DEFAULT NULL
) ENGINE=InnoDB
I am using this query .
select
count(distinct(USER_ID)) as signin_count,
PROVIDER from signin_details s1
where
s1.USER_ID NOT IN
(
select
USER_ID
from signin_details
where
signin_details.PROVIDER=s1.PROVIDER
and signin_details.OPERATION='SIGN_UP'
and signin_details.TIMESTAMP/1000 BETWEEN UNIX_TIMESTAMP(CURRENT_DATE()-INTERVAL 1 DAY) * 1000 AND UNIX_TIMESTAMP(CURRENT_DATE()) * 1000
)
AND OPERATION='SIGN_IN' group by PROVIDER;
Explain Output:
+----+--------------------+----------------+------+---------------+------+---------+------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+----------------+------+---------------+------+---------+------+------+-----------------------------+
| 1 | PRIMARY | s1 | ALL | NULL | NULL | NULL | NULL | 6 | Using where; Using filesort |
| 2 | DEPENDENT SUBQUERY | signin_details | ALL | NULL | NULL | NULL | NULL | 6 | Using where |
+----+--------------------+----------------+------+---------------+------+---------+------+------+-----------------------------+
Query Output :
+--------------+----------+
| signin_count | PROVIDER |
+--------------+----------+
| 1 | FaceBook |
| 2 | Yahoo |
+--------------+----------+
It takes more than 40 minutes to execute for 200k rows.
My assumption is it will check each row with total number of dependant subquery output.
My Assumption on this query.
A -> Dependant Outputs (B,C,D) .
A check with B
A check with C
A check with D
If dependant query output is larger it will take so long time to execute. How to improve this query?
If you use MySQL you have to know that sub queries performs awful slow.
IN
is slow...EXISTS
is often faster thenIN
JOIN
is mostly the fastest way do things like this.http://sqlfiddle.com/#!2/122ac/12
NOTE: If you wonder about the sqlfiddle result consider here is a
UNIX_TIMESTAMP
in the query.Result:
MySQL and the
INTERSECT
story. You get all combinations ofUSER_ID
andPROVIDER
which you don't want to count. ThenLEFT JOIN
them to your data. Now all the rows you want to count have no values from theLEFT JOIN
. You get them byt.USER_ID IS NULL
.Input: