MongoDB Schema: Nested, Flattened, or Independent Collections?

342 Views Asked by At

We are writing an application in which we have multiple 'Projects'. Each 'Project' has multiple 'Boards'. Each 'Board' has its own set of 'Comments'. What is the recommended way to structure this in MongoDB?

= Option I (nested collection)
  -Project
    |
    |----- Board
             |
             |----- Comments


= Option II (flattened collection)
  -Project
    |
    |----- Board
    |
    |----- Comment
              |-----Board_ID


= Option III (independent collections)
  -Project

  - Boards
      |-----Project_ID

  - Comments
      |-----Board_ID   

There are 10,000 projects. Each project has 5 Boards, so total boards is 50,000. Each Board has 20 comments, so total comments are '1,000,000. Only one project, and one board can be open in the application at one time.

So, if we pick Option I, then to get the associated 'Comments' for a particular project/board combination, we will have to query/parse through only 20 comments. However, if I pick Option III, then, to get the associated 'Comments' for a given project/board combination, we will have to query/parse through 1,000,000 comments. So, in theory, Option I sounds faster and more efficient. However, Option I uses a nested collection: Is there any dis-advantages on a nested collection? Are there any reasons for not using nested collections in MongoDB, like Option I?

MongoDB experts: What Option (I, II, or III), is the recommended practice for such cases?

1

There are 1 best solutions below

0
On

Probably the most important question is: What do you read and write together?

Only one project, and one board can be open in the application at one time.

So basically 1 project with its 20 comments are mainly read and written together? Then I'd store them in one document (embedded comments) and have a projects collection pointing to the boards collection.

Background:

  • Even if you read a single attribute from a document, you're always fetching the whole document from disk and load it into RAM. Even if you limit the query to the single attribute, you'll load it — you just won't send it over the network.
  • Since there are no (multi-document) transactions, put things into a single document, which you want to write atomically.
  • Avoid growing documents, since one document needs to be stored as a single block on disk and moves are expensive. Either preallocate values (if possible) or use separate documents for stuff you want to add later on.