When to use multiple foreign keys in a single table vs using associative tables?

199 Views Asked by At

I'm trying to understand when to include multiple foreign keys into a table vs when to use associative tables.

Edit: Added a diagram to hopefully make it easier to understand.

I have a settings system that many settings are specific to a division or classification. So, for example, a setting like "number per team" might be different in division 1, class A than in division 3, class A. And "minimum score" might be different in Division 1, class A than in Division 1, class C. Obviously there are also settings that have nothing to do with a division or classification that I feel should be in the same table.

So one way I could do this is to just put the div/class IDs as FKs in the settings table. Then any settings that don't depend on one or the other (or either) would be null for those IDs.

divisions
----------
id (int)
label (varchar)
description (varchar)

classification
--------------
id (int)
label (varchar)
description (varchar)

settings
---------
id (int)
division_id (int) (FK)
class_id (int) (FK)
key (varchar)
value (varchar)

Since there'd only ever be one combination of division, classification, and settings key, that should work, right? But that "feels" wrong somehow - almost like putting the different events one is competing in all in the competitor table. I also cannot guarantee they won't come up with some additional category/type they'd wish to apply settings to later. So that makes me want to do something like:

divisions
----------
id (int)
label (varchar)
description (varchar)

classifications
--------------
id (int)
label (varchar)
description (varchar)

settings
---------
id (int)
key (varchar)
value (varchar)

settings_to_divisions
-------------------------
setting_id (int)
division_id (int)

settings_to_classifications
------------------------------
setting_id (int)
class_id (int)

I think this would allow for easier growth when they come up with some new category (just create 'categories' and a 'settings_to_categories' tables). But then I'd likely have a whole set of nearly duplicate records in the settings table ([id], "min_score",5) and multiple, almost identical tables (id, label, description) for the associated items - all just to create one unique setting record.

Then what happens when they say they want different settings for men vs women? Do I just add an extra field into the settings table and hard code the 'Male' and 'Female' strings into the record, null everywhere except for settings that are gender specific? Or at the other end of things, create a separate 'gender' table with only 2 records and use that in the user table as well? Suddenly, I'm at:

divisions
---------
id (int)
label (varchar)
description (varchar)

classifications
---------------
id (int)
label (varchar)
description (varchar)

categories
----------
id (int)
label (varchar)
description (varchar)

type
----
id (int)
label (varchar)
description (varchar)

gender
------
id (int)
label (varchar)
description (varchar)

settings
---------
id (int)
key (varchar)
value (varchar)

settings_to_divisions
---------------------
setting_id (int)
division_id (int)

settings_to_classifications
---------------------------
setting_id (int)
class_id (int)

settings_to_categories
----------------------
setting_id (int)
category_id (int)

settings_to_type
----------------
setting_id (int)
type_id (int)

settings_to_gender
------------------
setting_id (int)
gender_id (int)

It begins to feel almost ... overly normalized. Though maybe that's just me.

I'm just trying to design this intelligently so that even though some settings apply to all Division 3 players, when they tell me that they need further categories and that the application should behave differently for Division 3, Class A, Category Green, Rogue, Male players vs Female players, I don't have to redesign my database or application too much in order to easily get/set whatever that different setting should be.

I'm sure this isn't a new concept. This has to be a recognized design pattern out there somewhere, but it's hard trying to search for methods and approaches for dealing with a situation where you might have 3 (or 5 or 8) different ways of associating a single database entity.

I'm currently leaning toward the association table approach (significant normalization), but I'm not really a DBA. So am I making a stupid design decision? What would be the negatives/challenges of that approach? Would a still different approach be better?

1

There are 1 best solutions below

0
On

This post is a bit old but I will still answer for future readers:

The simple answer: Use Approach 1

The only reason you would need a table between divisions/classification is to associate a single setting with multiple divisions/classification records.

I can tell you from experience you don't want that. While using the same setting over and over sounds like a great time and space saving mechanism, you will regret it in the future. The second you need to create a new setting for a subset records associated with a single setting or, worse, you will need to make a change to setting table that forces a vast majority records to be associate a single divisions/classification record with a single setting - you will hate your life.

Secondly, creating a setting table where the settings are text strings is very flexible but not over efficient. If you have multiple settings for a single divisions/classification your application must loop through the results performing if statements on each record.

The alternative, of course, is to create a column for each setting. This too can be a pain point as new settings require a deployment. However, if a good number of settings are Boolean, you can achieve significant performance gains but using BIT values.

Example:

If you have 6 BIT fields, the database engine needs to only compare one byte of data/record to find the value needed - versus searching on text which must compare multiple bytes over multiple records.

Lastly, if the number of settings grows significantly in future, performance could be affected using text strings.