I'm working on a set of database tables in Oracle and trying to figure out a way to enforce referential integrity with slightly polymorphic data.
Specifically, I have a bunch of different tables--hypothetically, let's say I have Apples, Bananas, Oranges, Tangerines, Grapes, and a hundred more types of fruit. Now I'm trying to make a table which describes performing steps involving a fruit. So I want to insert one row that says "eat Apple ID 100", then another row which says "peel Banana ID 250", then another row which says "refrigerate Tangerine ID 500", and so on.
Historically, we've done this in two ways:
1 - Include a column for each possible type of fruit. Use a check constraint to ensure that all but one column is NULL. Use foreign keys to ensure referential integrity to our fruit. So in my hypothetical example, we'd have a table with columns ACTION, APPLEID, BANANAID, ORANGEID, TANGERINEID
, and GRAPEID
. For the first action, we'd have a row 'Eat', 100, NULL, NULL, NULL, NULL, NULL
. For the second action, we'd have 'Peel', NULL, 250, NULL, NULL, NULL
. etc. etc.
This approach is great for getting all of Oracle's RI benefits automatically, but it just doesn't scale to a hundred types of fruit. You end up getting too many columns to be practical. Just figuring out which type of fruit you are dealing with becomes a challenge.
2 - Include a column with the name of the fruit, and a column with a fruit ID. This works also, but there isn't any way (AFAIK) to have Oracle enforce the validity of the data in any way. So our columns would be ACTION, FRUITTYPE
, and FRUITID
. The row data would be 'Eat', 'Apple', 100
, then 'Peel', 'Banana', 250
, etc. But there's nothing preventing someone from deleting Apple ID 100, or inserting a step saying 'Eat', 'Apple', 90000000
even though we don't have an Apple with that ID.
Is there a way to avoid maintaining a separate column per each individual fruit type, but still preserve most the benefits of foreign keys? (Or technically, I could be convinced to use a hundred columns if I can hide the complexity with a neat trick somehow. It just has to look sane in day-to-day use.)
CLARIFICATION: In our actual logic, the "fruits" are totally disparate tables with very little commonality. Think customers, employees, meetings, rooms, buildings, asset tags, etc. The list of steps is supposed to be free-form and allow users to specify actions on any of these things. If we had one table which contained each of these unrelated things, I wouldn't have a problem, but it would also be a really weird design.
It's not clear to me why you need to identify the FRUIT_TYPE on the TASKS table. On the face of it that's just a poor (de-normalised) data model.
In my experience, the best way of modelling this sort of data is with a super-type for the generic thing (FRUIT in your example) and sub-types for the specifics (APPLE, GRAPE, BANANA). This allows us to store common attributes in one place while recording the particular attributes for each instance.
Here is the super-type table:
FRUITS has a primary key and a compound unique key. We need the primary key for use in foreign key constraints, because compound keys are a pain in the neck. Except when they are not, which is the situation with these sub-type tables. Here we use the unique key as the reference, because by constraining the value of FRUIT_TYPE in the sub-type we can guarantee that records in the GRAPES table map to FRUITS records of type 'GRAPE', etc.
In 11g we can make FRUIT_TYPE a virtual column for the sub-type and do away with the check constraint.
So, now we need a table for task types ('Peel', 'Refrigerate', 'Eat ', etc).
And the actual TASKS table is a simple intersection between FRUITS and TASK_TYPES.
If this does not satisfy your needs please edit your question to include more information.
Yes I wondered whether that was the motivation underlying the OP's posted design. But usually workflow is a lot more difficult than that: some tasks will apply to all fruits, some will only apply to (say) fruits which come in bunches, others will only be relevant to bananas.
So you have a bunch of existing tables. You want to be able to assign records from these tables to tasks in a freewheeling style yet be able to guarantee the identify of the specific record which owns the task.
I think you still need a generic table to hold an ID for the actor in the task, but you will need to link it to the other tables somehow. Here is how I might approach it:
Soem sample existing tables:
A generic table to hold actors:
Now, you need intersection tables to associate your existing tables with the new one:
The TASKS table is rather unsurprising, given what's gone before:
I agree all those intersection tables look like a lot of overhead but there isn't any other way to enforce foreign key constraints. The additional snag is creating ACTORS and CUSTOMER_ACTORS records every time you create a record in CUSTOMERS. Ditto for deletions. The only good news is that you can generate all the code you need.
Is this solution better than a table with one hundred optional foreign keys? Perhaps not: it's a matter of taste. But I like it better than having no foreign keys at all. If there is on euniversal truth in database practice it is this: databases which rely on application code to enforce relational integrity are databases riddled with children referencing the wrong parent or referencing no parent at all.