I have read it somewhere in article on query optimization that instead applying selections on cartesion product, it is better to use selection theta joins. So basically if we have 2 employee tables with age as column then
select name from A inner join B on A.age>B.age
is faster than
select name from A,B where A.age>B.age
But we all know that first query is equivalent or has same meaning as query 2. Then why it is said that applying selections on theta join is faster than applying select on cartesion product I am trying to learn query optimization. And I read following statement
Selection(E1 *E2)= E1(theta join operator)E2
kindly read point 4 of this article--- https://www.geeksforgeeks.org/query-optimization-in-relational-algebra/
What is the difference between the 2 queries in terms of performance?
This is the first time I have heard the term "Theta Join". Database developers simply call these inner joins. Your first query
shows an inner join. Your second query
also shows an inner join, only with deprecated syntax. Before explicit joins (
[INNER] JOIN,LEFT [OUTER] JOIN,CROSS JOIN) etc. got introduced in the SQL standard in 1992, we just listed the tables comma-separated in theFROMclause and used theWHEREclause to state the join criteria. While this old syntax still works, it is looked upon with disdain if you still use it.The following is also an inner join:
Only, it is a hidden (or better: obfuscated) inner join. The developer makes it look like a cross join of the tables and then adds the join criteria later. This is syntactically correct, but semantically it isn't. So, we shouldn't write such queries.
To the DBMS is should make no difference, if you write query #1, #2 or #3. It is all the same query in the end. It would speak against the DBMS, didn't it recognize this and come up with the same execution plan for all these queries. The article mentioning that query #1 is faster than query #2 is either wrong or is talking about some particular DBMS and probably even a particular version of that DBMS then.
Maybe still, the article didn't want to talk about how to write queries, but how to process them. If a DBMS would always build a cartesian product and only then filter the data, it might often be slow. It is usually better to filter early and work with smaller data sets.