Many to many self reference without id numbered

74 Views Asked by At

I want to make a relationship between two Tags entities, but I don't like the typical way it is handled in the RDBMS databases.

Like here: https://stackoverflow.com/a/35784048/1624397

INSERT INTO RECOMMENDED_BOOKS (Book_id1, Book_id2) VALUES (1, 2)
INSERT INTO RECOMMENDED_BOOKS (Book_id1, Book_id2) VALUES (1, 3)

Book_id1, Book_id2...

Or another "bad" example I'm looking for an alternative to (which makes sense in this case, anyway):

Self-referencing to a User friendsWithMe and myFriends.

If I do something like tag_id1 and tag_id2 I either will be forced to search for whether there is a relation between both twice, or be forced to keep redundant data.

Is there any alternative solution?

Preferably the solution was storage-agnostic.

2

There are 2 best solutions below

1
reaanb On

If I understand correctly, you have a problem with symmetric relationships since there are two ways to represent any pair of associated tags. Recording both ways result in redundant data, e.g. (1, 2) represents the same relationship as (2, 1). Recording only one of the two, without a symmetry-breaking rule, requires more complex queries, e.g. WHERE (tag_id1, tag_id2) IN ((1, 2), (2, 1)).

The trick is to introduce a symmetry-breaking rule, e.g. tag_id1 <= tag_id2. When inserting / updating data, you have to enforce the rule. That's easy if your DBMS supports check constraints, if not, you can consider using a trigger to do the same.

This simplifies queries - you can sort the arguments you want to search for so that you only have to search for a single permutation, e.g. (1, 2).

Perhaps one day we'll have DBMSs with optimized storage engines for symmetric relationships, trees, and so on.

0
Justinas Marozas On

I'm not aware of a way to solve many-to-many table without data redundancy and have simple queries in relational databases.

You could cheat and create a view that duplicates the data on query time, and it would look something like this:

CREATE VIEW VW_Friends
AS
SELECT PersonID, FriendID
FROM Friends
UNION
SELECT FriendID, PersonID
FROM Friends

I believe that would be slow and not-very-intuitive and I wouldn't generally recommend it, but it is a possible solution.

In your place I would go with redundant data, because that would be optimized for SELECTing data and in most cases table like this will have many more reads than writes.

If that is not the case and you have more writes than reads - don't duplicate the data and have awkward SELECTs with queries on both columns.

I hope this helps.