Performance of UNION vs UNION ALL

9.4k Views Asked by At

I am selecting a single column of foreign keys from multiple tables through either UNION or UNION ALL.

It is generally recommended to use UNION ALL instead of UNION for performance issues when duplicates do not matter. However, in my calling PHP script it would be more efficient to loop through and manipulate the data without duplicates.

So, I can use either of the following options:

Option 1:

Use UNION in the database to eliminate duplicates

Option 2:

use UNION ALL in the database and eliminate the data in my PHP script using array_unique() or other similar functions.

My assumption is that Option 1 would be the preferred and more efficient method in the majority of cases, however I have nothing to back up that assumption, and not sure the best way to test it especially since it would likely depend a lot on what the data was.

Is my assumption correct in most cases? If so, why? If not, why not?

2

There are 2 best solutions below

0
On BEST ANSWER

The mainly aspect is that UNION is shortcut for UNION DISTINCT and so

the difference in performance between UNION and UNION ALL are related to the
need to obtain a distinct result and for this the database engine and the query optimizer are surely more effective and most efficient than the filtring alogoritm based on PHP code in application.

The dictinct Operation can, moreover, benefit from the pre-optimizations for group by functionality

Not only, the duplicate data filtering is generally based on ordered data while the select sql functions work without explicit ordering, and therefore the need for filtering data with the application can lead to less efficient and more longer queries.

Generally the db engine is much more efficient that application PHP functions code so the Option 1 is generally the better choise

2
On

Speed-wise, it is relatively insignificant. The effort to do all the SELECTs is more than to do the de-dup, whichever way you do it.

Therefore, I recommend saying UNION DISTINCT, since that is few keystrokes for you than array_unique(...).

Other considerations:

  • UNION ALL would shovel more stuff from the server to the client; this (in extreme or distant situation) could be a factor in performance.
  • If you are also saying ORDER BY on the UNION, you may as well do the DISTINCT, too.
  • GROUP BY (on the UNION) has the effect of DISTINCT.
  • If you are talking about millions of rows, keep in mind that PHP can hit memory limits on arrays, whereas MySQL is essentially unlimited.