Impala: Is it logical to split a joining query with OR condition into two queries?

197 Views Asked by At

I have the below query structure that I am using Impala to perform, that it took over 20 hours and didn't finish:

INSERT INTO Final_table
with t1
AS
(SELECT account_id, request_id, status_1
 FROM table_1
 WHERE status_1 = "20"
),
t2 AS
(
 SELECT account_id, request_id, status_2
 FROM table_2
 WHERE status_2 = "10"
)
SELECT t2.account_id, t2.request_id, t2.status_1, t1.status_2
FROM t1
INNER JOIN t2
ON (t1.account_id = t2.account_id OR t1.request_id = t2.request_id);

The problem is exactly in the "OR" condition in the ON statement, because the number of records that result from t1 separately is around 14M, and from t2 separately is around 15M records. Because I faced a memory issue, I have taken the t1 and t2 subqueries, performed them separately and saved them into new tables. and then performed the joining based on the below:

CREATE TABLE sub_table_1
AS
 SELECT account_id, request_id, status_1
 FROM table_1
 WHERE status_1 = "20"
CREATE TABLE sub_table_2
AS
 SELECT account_id, request_id, status_2
 FROM table_2
 WHERE status_2 = "10"
INSERT INTO Final_table
SELECT t2.account_id, t2.request_id, t2.status_1, t1.status_2
FROM sub_table_1 AS t1
INNER JOIN sub_table_2 AS t2
ON (t1.account_id = t2.account_id OR t1.request_id = t2.request_id);

The sub tables were created successfully, but the final joining is still facing the same issue. Will it be logical if I performed the join on two steps, each step with one of the conditions, and then join the two results? Or if there will be another helping method?

1

There are 1 best solutions below

1
On

You can use UNION

  1. Get result from first join (result)

2.result UNION result2 from 2nd join condition

SELECT * FROM t1 JOIN t2 ON t1.account_id = t2.account_id UNION SELECT * FROM t1 JOIN t2 ON t1.request_id = t2.request_id