Performance comparison of surrogate & composite key

2.8k Views Asked by At

If a database has attributes A1, A2, A3...An and A1, A2 & A3 can form composite key together, is it better to use a surrogate key instead of a composite key?

Using a surrogate key will improve the Insertion execution speed of records (this supports surrogate over composite key) But the SELECT, UPDATE and DELETE queries based on the attributes A1, A2 & A3 will be tremendously slowed down if we use surrogate key(this supports composite key over surrogate key).

Which is better in terms of performance given such conditions? Surrogate key or composite key?

2

There are 2 best solutions below

2
On

In nearly all tests, there was little to no performance advantage of surrogate keys over natural keys. Natural keys also have the advantage of being much easier to work with. A better write-up is available here.

4
On

Performance is not the primary concern of choosing whether to implement a surrogate primary key.

We find that the ideal primary key has several desirable attributes

  • simple (single column, native datatype)
  • unique (positively NO duplicate values)
  • non null (every row will have a value)
  • immutable (once assigned it is never changed)
  • anonymous (carries no "information")

There's no "rule" that says that the candidate key selected as the primary key must have all of these properties, but these are properties that are desirable, for various reasons.

There's not even a "rule" that says all tables need to have a primary key. But we find it desirable that they do.

Successful software systems have been built using surrogate keys as well as natural keys.


In terms of performance, there's not really that much of a difference that can be demonstrated. But do consider this: if an entity table has a primary key that is a composite key that consists of several "large" columns, those same large columns have to be repeated in any table that has a foreign key reference to that entity table, and in some storage engines (InnoDB), those get repeated in every index.

But performance is not really the deciding factor. (Anyone that suggests that performance should be the deciding factor in selecting a candidate key as the primary key hasn't really thought about it enough.)


As far as being "easier to work with", many developers find it easier to use a single column as the primary key vs. a composite key consisting of two, three, or more columns.

Some developers that have opted for natural keys as primary key have later been burned by their selection of a candidate key. Not because it was a natural key, but because further along in development, "new" requirements were "discovered", and it turned out that the candidate key they had selected as the primary key wasn't really always unique, or that it wasn't exempt from being changed, or that it wasn't really anonymous.

There are lots of software projects that have been successful using natural keys, and composite keys as PRIMARY KEY. Just as there's been success using surrogate key as PRIMARY KEY.