In case of a JOIN, especially in a one to many relationship, the result set will very often contain a lot of duplicate information in the result.
For example,
| TABLE_A_ID | TABLE_A_FIELD_ONE | TABLE_A_FIELD_TWO | TABLE_B_ID | TABLE_B_FOREIGN_KEY | TABLE_B_FIELD_ONE |
|---|---|---|---|---|---|
| 12 | MARK | POLIO | 41 | 12 | SHOE |
| 12 | MARK | POLIO | 42 | 12 | SHIRT |
| 12 | MARK | POLIO | 43 | 12 | BOOT |
| 12 | MARK | POLIO | 44 | 12 | HAT |
| 12 | MARK | POLIO | 45 | 12 | GLOVE |
As you can see, 12, MARK, POLIO are sent six times in a row. In my real situation, I am noticing that I am sending the same data thousands of times in a row, sending over 20x the amount of data over the wire from db to app server, and 20x as much memory.
I was about to start optimizing the **** out of it, but then I realized that it is very likely that mysql drivers already do optimize this, for example by sending only table b columns, and table a columns for when TABLE_B_FOREIGN_KEY changes.
It is really optimized this way or another way, or should I stop using joins and instead do round trips to the DB?
When the duplicates are in a single table, consider adding
UNIQUE (col)to that table. This forces uniqueness (at Insertion time) based oncol. If Uniqueness is determined by a combination of columns, thenUNIQUE(col1, col2, ...).More likely you need
DISTINCTto dedup whole rows.But there are cases where the dedupping should be done in other ways. Let's see your
SELECT. (I'm thinking ofGROUP BYor a 'derived table'.)