Does mysql optimize result table to avoid duplicate values

224 Views Asked by At

In case of a JOIN, especially in a one to many relationship, the result set will very often contain a lot of duplicate information in the result.

For example,

TABLE_A_ID TABLE_A_FIELD_ONE TABLE_A_FIELD_TWO TABLE_B_ID TABLE_B_FOREIGN_KEY TABLE_B_FIELD_ONE
12 MARK POLIO 41 12 SHOE
12 MARK POLIO 42 12 SHIRT
12 MARK POLIO 43 12 BOOT
12 MARK POLIO 44 12 HAT
12 MARK POLIO 45 12 GLOVE

As you can see, 12, MARK, POLIO are sent six times in a row. In my real situation, I am noticing that I am sending the same data thousands of times in a row, sending over 20x the amount of data over the wire from db to app server, and 20x as much memory.

I was about to start optimizing the **** out of it, but then I realized that it is very likely that mysql drivers already do optimize this, for example by sending only table b columns, and table a columns for when TABLE_B_FOREIGN_KEY changes.

It is really optimized this way or another way, or should I stop using joins and instead do round trips to the DB?

1

There are 1 best solutions below

1
Rick James On

When the duplicates are in a single table, consider adding UNIQUE (col) to that table. This forces uniqueness (at Insertion time) based on col. If Uniqueness is determined by a combination of columns, then UNIQUE(col1, col2, ...).

More likely you need DISTINCT to dedup whole rows.

SELECT DISTINCT col1, col2, ...
    FROM ...
    JOIN ... ON ...

But there are cases where the dedupping should be done in other ways. Let's see your SELECT. (I'm thinking of GROUP BY or a 'derived table'.)