How to avoid mistakes at Primary Key

492 Views Asked by At

Hi I'm a beginner at Databases, for this reason I want to ask you which atributes should I use as primary key to avoid mistakes:

    CREATE TABLE customer(
    name
    first_lastname
    street
    ZIP_code
    mobile_phone
    telephone
    email
    gender
    birthdate
    nationality);

Optionally I was thinking to add idcustomer as auto_increment but I am not sure that will be a great idea.

5

There are 5 best solutions below

7
On BEST ANSWER

I was thinking to add idcustomer as auto_increment but I am not sure that will be a great idea.

It is indeed a good idea.

Your other columns (attributes) do not necessarily have unique values. In other words, they are not suitable to use as natural primary keys. What sort of value might work as a natural primary key? An employee number, possibly. A product serial number might work. Taxpayer ID numbers (social security numbers) do not work: a surprising number of people use duplicate numbers by mistake. The uniqueness standard for choosing a real-world item as a primary key is so high that most database designers don't even try.

So creating a guaranteed-unique primary key is generally good design. The jargon for that kind of key is surrogate primary key. Most DBMS systems, MySQL included, provide autoincrementing numbers for that purpose.

You can choose one of two conventions for naming that id value. One is to call it id. The other is to call it customer_id (the table name with _id added). The second one will help you keep things straight when you start using those values in other tables to establish relationships.

For example, you might have a sales table. That table might have these columns:

sales_id      autoincrementing pk
customer_id   the id of the customer to whom the sale was made. (foreign key)
item_sold     description of the item
list_price
discount
net_price

You get the idea. Read about primary keys and foreign keys. In the jargon of "logical database design," you can read about entities (customer, sales) and relationships. Each table gets its own series of auto-incrementing values.

You can then use a query like this to find out sales to each customer.

 SELECT customer.name, customer.first_lastname,
        COUNT(sales.sales_id) number_of_sales,
        SUM(sales.net_price) revenue
   FROM customer
   JOIN sales ON customer.customer_id = sales.customer_id
  GROUP BY customer.customer_id, customer.name, customer.first_lastname

Here the sales entity has a many-to-one relationship to the customer entity. That's implemented by having a customer_id attribute in each sales row pointing back to the customer.

It's also a convention to make the id the first column in each table.

Conventions are good: they help the next person to look at your application. They also help your future self.

Note: my sales table is just an example to show how autoincrementing id values might be useful. I don't claim it's a good layout for a real-world sales table: it is not.

0
On

The safest approach is to create a PK column named id on each table. Don't be a hero, just go for an unsigned bigint. PK overflow, however unlikely, is not a problem you want.

You can use: id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY

Or replace the middle bits with the SERIAL keyword, which is an alias for BIGINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE

Keep in mind that AUTO_INCREMENT can cause issues if you're using statement-based replication. Statement-based replication is the default before 5.7.6.

Using a synthetic key decouples the features of an object you're modelling from the unique identifier of that object, which is handy if you need to alter the schema. Altering a MySQL PK is expensive. It also guarantees you'll have a unique, non-null column for referencing with Foreign Keys. Also, some ORMs expect an id PK column - if you're in to that sort of thing.

With MySQL you can create a composite clustered index, which is a Primary Key having more than one column. This could be an optimization if you know for certain that the table will never become gigantic, and that you'll be regularly accessing the table with complex filters which specify a leftmost subset of the columns in that key. I wouldn't use this approach though.

InnoDB tables require a primary key, though. Even if you don't explicitly create one the database will implicitly choose the first UNIQUE column it finds. If there are none it will create a hidden column named GEN_CLUST_INDEX.

2
On

Surprisingly none of the answers so far have asked about your business requirements. Do you understand your business process, what interactions happen with the customer and how the customer will get identified in the business domain? The identifying attribute(s) - in an e-commerce application it might be a login name for example - usually ought to be a key in your table. Just adding an auto-increment is not the right thing to do unless you understand what that key is for.

0
On

There are several desirable properties of a PRIMARY KEY (some of these are pretty obvious, but we'll enumerate them)

  • non null - (each row is guaranteed to have a non-NULL values for all PK columns)
  • unique - (no two rows will ever have the same set of values. ever)
  • simple - (single column, native datatype)
  • short - (the cluster key will be repeated in every secondary index, and foreign keys)
  • immutable - (once assigned, the value will not be changed)
  • anonymous - (does not carry any meaningful information)

We can hold opinions and have discussions about each of these properties, the implications and benefits, and the downsides of primary keys that don't have these properties. But a lot of ends up being opinion about what is most important, and what isn't at all important.)

I have rationale for holding each and every one of these properties to be desirable. And I recognize that others do not hold the same opinion.

If this list is valid, then a surrogate primary key can fit all of these.

In MySQL, one possible way to implement a surrogate primary key would be an extra column added to the table:

 CREATE TABLE mytable 
 ( id                INT NOT NULL AUTO_INCREMENT PRIMARY KEY  COMMENT 'PK'
 , cust_email        VARCHAR(255) NOT NULL                    COMMENT 'UX1'
 , cust_name_title
 , cust_name_first
 , cust_name_last
 , cust_name_suffix
 , cust_addr_street
 , cust_addr_line2
 , cust_addr_city
 , cust_addr_state
 , cust_addr_postal_code
 , UNIQUE KEY customer_UX1 (cust_email) 
 )

Note that using AUTO_INCREMENT is not a requirement. This is a feature that many find useful and easy to use. (There are some details about AUTO_INCREMENT that make it a less than perfect feature in terms of PRIMARY KEY.)


IMPORTANT

I do not assert that using a surrogate primary key is the right way, or the only way.

A surrogate primary key is not a requirement of a successful database implementation project. Many successful projects are implemented using natural keys.

But I will note (in closing) that some adamant believers in natural keys have been badly burned when it turns out (much later in the project, newly discovered requirements) that the natural keys that were selected turn out not to satisfy one (or more) of the "desirable properties" I listed.

0
On

A primary key is a column or a set of columns that uniquely identify rows in a table. With that in mind, you can make whatever column(s) that identify the customer rows uniquely as the primary key. You may use phone number or a combination of first name, last name, and phone number as a primary key. But the more accepted way is to add an additional column, perhaps named idcustomer like you thought of or customer_id or just id, that is bound to be unique for each customer and make it the primary key; making this integer column auto_increment is a good idea.