Tag schema to provide functionality similar to Stack Overflow's tag synonyms

775 Views Asked by At

I'm currently designing a database for a small bookmarking application (using MySQL) and I'd like to do something clever with the tag system. Although it's not an issue initially, I want to implement something similar to Stack Overflow's tag synonyms, where each tag can have multiple sub-tags that map to it. This would allow tag searches for 'hi' to return bookmarks tagged with 'hello' for example.

I'm familiar with building a many-to-many tag system in which you have three tables: 'tags', 'posts' and 'posts_tags' and I'd like to make synonyms fit in with this.

My initial thoughts were that each tag could have a 'parent' field which contains the ID of the tag it maps to. However, this could cause a lot of orphaned tags and would be a nightmare to manage; I'm looking for speed and elegance.

If anybody has any ideas/guidance it'd be much appreciated! Thanks

2

There are 2 best solutions below

2
On BEST ANSWER

You could use a child to parent table. For example,

Tags
tagId, pk

ChildToParentTags
childTagId, pk, fk (pk of this table, fk into Tags table)
parentTagId, fk (fk into Tags table, have an index for this column).

Post
postId, pk

assume many-to-many post to tag relationship
PostToTag
postId, pk
tagId, pk

Using childTagId as the pk of the ChildToParentTags table limits a tag to 0 or 1 parent, but allows a parent to have multiple children.

Query for post by tag:

select
 post.postId,
 post.otherStuff
from
 post 
  inner join postToTag on
   post.postId = postToTag.postId
  inner join tag on
   postToTag.tagId = tag.tagId
where
 tag.something = 'desired tag value'
0
On

You can use a system which assigns groups. Creating parent child relationship as mentioned above does not help much. Whereas creating a group relationship helps make search faster.

Create a table called as groups -

id, name, groupid

Each elements that are synonyms should be assigned a groupId (which can be a code assigned number). Whenever there is a movement of elements, or a new element is assigned to a group, or an existing element is moved from a group, all you have to do is update the groupid.

This makes search faster, because, whenever you have to search for something, all you have to do is search for a groupid. All elements with same groupid can be search without the need for an IN clause.

I am assuming that this table will have a FK relationship with some other table. Wherever you have FK relationship, instead of having "id" with PK-FK relation, you can have "groupid" as FK relationship.