I am quite confused by something I'm seeing in an Oracle 10 database.
I have the following query.
select
t2.duplicate_num
from table1 t1, table2 t2,
(
select joincriteria_0 from intable1 it1, intable2 it2
where it2.identifier in (4496486,5911382)
and it1.joincriteria_0 = it2.joincriteria_0
and it1.filter_0 = 1
) tt
where t1.joincriteria_0 = tt.joincriteria_0
and t2.joincriteria_1 = t1.joincriteria_1
and t2.filter_0 = 3
and t2.filter_1 = 1
and t2.filter_2 not in (48020)
It doesn't really seem like anything special to me, here are the baseline performance numbers from autotrace:
CR_GETS: 318
CPU: 3
ROWS: 33173
Now if I add the 'DISTINCT' keyword to the query (e.g. 'select distinct t2.duplicate_num...') this happens
CR_GETS: 152921
CPU: 205
ROWS: 305
The query plan has not changed, but the logical IO grows by a factor of 500. I was expecting CPU only to go up and logical IO to be largely unchanged.
The net result is a query that runs 10-100x slower with the distinct keyword. I can put code into the applciation which would make the result set distinct in a fraction of the time. How does this make any sense? particularly without the query plan changing?
This indicates a lack of index somewhere. It also means, your original query without the distinct clause wasn't optimized. With "distinct" also it could not be optimized, so the query plan remained the same. An unoptimized query varies widely in performance due to the full table scans.