"Master" associative table?

542 Views Asked by At

Consider a model for matching clients and sevices. Clients may be both providers of and consumers of services at various times. Clients may be individuals or groups (companies), the latter having multiple contacts. Contacts may have multiple addresses, phones, e-mails. Some of these relationships will be one-to-one (e.g., service to provider), but most will be one-to-many or many-to-many (multiple contacts at a company would have the same address).

In this model several associative tables would typically exist, e.g., client_contact, contract_addr, contact_phone, contact_email, service_provider, service_consumer, etc.

Say you issue a simple query for contact information for consumers of a given service. In addition to the six entity tables containing the data, the joins would reference five associative tables. Nothing particularly interesting about about this kind of query, of course - we do it every day.

It occurred to me though: why not have a single "master" associative table holding all associations? It would require this master table to have an "association type" in addition to the two PKs, and for all PKs to be of the same type (ints, GUIDs, etc.).

On the one hand, queries would become more complicated because each join would need to specifiy the type and PK. On the other hand, all joins would access the same table, and with appropriate indexng and caching performance could improve dramatically.

I assumed there might be a pattern (or anti-pattern) describing this approach, but haven't found anything on-line. Has anyone tried it? If so, does it scale?

Any references you can provide would be appreciated.

3

There are 3 best solutions below

1
On

What you're describing reminds me of fact tables from data warehousing. My understanding is that you start with a typical transactional schema with a table to model every many-to-many relationship. Then, to restructure the data for easier dimensional analysis, you can aggregate some / all the relationships in your schema into one wide table where each column is a key. This effectively performs all possible joins ahead of time and dumps them into a table, inverting the purpose of query joins from relationship following to getting to the properties of your entities.

Anyway, my understanding of this stuff is hazy and my experience effectively nil, but maybe your idea is a fact table by another name, making them useful to investigate.

0
On

First off, I think you're definitely paying a price in maintainability. Any time I have a "type" column like that, I think red flag. It seems likely to lead to magic strings in your procedures--you need to make sure type is consistent across inserts and selects, eg. So any performance increase needs to be big enough to justify this headache.

Second, you're paying a price in storing more data--the extra "type" column for each association. And then this data needs to be retrieved when running a query, which affects how many rows can be in memory at once (maybe).

Third, each query probably needs to access the same total number of rows, regardless of whether they're stored in multiple tables or one. So, unless you know something about your data that will let you create clustered indexes or something, you're probably retrieving the same number of pages when you run queries.

Fourth, the likely performance gains come from assuming that the index has a logarithmic behavior, and noting that 5log(N) is greater than log(5N), so it's better to use one big index than 5 smaller ones. However, the addition of the type column is going to reduce this benefit. I'm not really sure how to analyze if it would eliminate it completely, or just reduce it.

Fifth, it seems pretty likely that for at least some queries, you're going to end up joining multiple copies of that huge table, which really seems like it's going to be a killer.

I'd be interested to see what results you get, but I'd be surprised if there's a performance benefit.

0
On

This can be solved with abstraction and table inheritance.

An Individual Client, Organization Client, Service Provider are all Parties, which play Roles.

An Email Address, Telephone Number, Web Address, and Physical Address are all Addresses.