CosmosDB NoSQL: Data structure

92 Views Asked by At

I am trying to make my first NoSQL design but I am struggling a bit to find the right path for my use case.

In short, the user is a member of a teanet, this teanet has a set of products that only can be accessed by the members of the teanet. These products have to be able to filter on, like on a normal e-commerce site but in this case, the products are a part of a teanet. The users can have a lot of products over a long period of time so I think at some point a single partition could be an issue, also to avoid a hot partition with a lot of read if the teanet has more users and thousands of products.

Where my biggest question is how to get the partition key right.

  • When I have some user-specific data should the PK just be the teanet_id and located in the teanet container. For example if the teanet has its own categories and tags, this could be in the teanet partition with the type property with the values: category and tags etc.? So for all the data that will be a bit limited about 100-200 documents can be in the same collection with the same partition key.

  • For the product case I am a bit on deep water. Hope someone has a good idea how to make good pk for a senior where the products only will be accessed by a single teanet and have to be ordered/paged and showed and be able to be filtered on price,tags and categories and also just show a single product with all its detailed information?

    • The current best idea, I have is to make a GUID for each product as PK and then make a summary for a teanet with multiple products inside a single document but I am not sure to structure this, does anyone has a good article that explains this approach?
1

There are 1 best solutions below

0
On

It's tricky to give a prescriptive "right" answer with this topic given the number of variables. One thing to note is the partitioning strategy also depends on your tenant isolation approach. If you isolate tenants by container, you may end up with a different approach than using a single container design.

For what it's worth I documented some of the thinking behind a design in a recent project here, along with a presentation.

https://whally.com/blog/how-it-was-made

You'll see there the partition key being synthetic with tenant ID as prefix and item type as suffix. This has worked well, and the assumption is there will be no more than 20 GB of data in any partition. That's comfortably true for this case, but you'd need to determine that for yours.

For products, my hunch would be to place them all in a single partition per tenant, assuming you'd not be in danger of hitting 20 GB there. That way you can have the most freedom with product queries being single-partition scoped.

There's always the option to add additional materialized view data to optimize for certain queries. So your partition key doesn't necessarily need to result in optimal queries for every case.