I've spent my entire career working with denormalized relational databases. I am having a hard time un-learning all of that in order to implement a single-table design that can handle a couple specific access patterns on an "App Store"-like personal project.
Here's a quick ERD. There is an App model identified by a platform (iOS, Android) and bundle identifier along with a Defaults map that is used when creating new versions. Each App can have 0 to many Versions which are identified by a version number (which is a sequential numerical value and is unique within the context of an App). A version has an IsReleased attribute along with several others (like Name, Release Notes, Binary Path, etc).
Access Patterns
- List the latest version of every app.
- List the latest version of every app for a given platform.
- List the latest version of every app where IsReleased is 1.
- List the latest version of every app for a given platform where IsReleased is 1.
- Get the latest version of a specific app.
- Get the latest version of a specific app where IsReleased is 1.
- Get all versions of a specific app.
- Get all versions of a specific app where IsReleased is 1.
- Get the Default attribute for a specific app.
I'm having trouble with 1 though 4, this table is where I was headed. I'm having a hard time coming with a GSIs that will give me the all of the app items with a single version by sort order.
pk | sk | Defaults | App Name | Version | IsReleased | Other Attributes |
---|---|---|---|---|---|---|
app_ios_com.app.one |
defaults |
{ ... json ... } |
||||
app_ios_com.app.one |
version_1 |
App One | 1 | 1 | ||
app_ios_com.app.one |
version_2 |
App One | 2 | 1 | ||
app_ios_com.app.one |
version_3 |
App One | 3 | 1 | ||
app_ios_com.app.two |
defaults |
{ ... json ... } |
||||
app_ios_com.app.two |
version_1 |
App Two | 1 | 1 | ||
app_ios_com.app.two |
version_2 |
App Two | 2 | 0 | ||
app_ios_com.app.two |
version_3 |
App Two | 3 | 0 |
For example, for access pattern 1, I want:
pk | sk | Defaults | App Name | Version | IsReleased | Other Attributes |
---|---|---|---|---|---|---|
app_ios_com.app.one |
version_3 |
App One | 3 | 1 | ||
app_ios_com.app.two |
version_3 |
App Two | 3 | 0 |
For example, for access pattern 3, I would want:
pk | sk | Defaults | App Name | Version | IsReleased | Other Attributes |
---|---|---|---|---|---|---|
app_ios_com.app.one |
version_2 |
App One | 3 | 1 | ||
app_ios_com.app.two |
version_1 |
App Two | 1 | 1 |
Some data constraints that I have to keep in mind:
- There are currently only 10 to 20 apps, but I need to be able to support hundreds
- Most apps will have 100 to 200 versions with 20 to 30 released versions. The biggest app has 1000 versions of which 50 are released.
- In the back-end, the IsReleased flag will typically be toggled from 0 to 1, but will occasionally be toggled from 0 to 1.
- The average version item is approximately 2 KB.
- The access pattern variations where IsReleased is 1 are more frequently used by a significant margin.
I feel like the solution is right in front of me, but I can't put my finger on it.
TLDR; The solution that springs to mind is a leaderboard pattern to cache the latest app versions in separate record(s). Whenever a new version is added, DynamoDB Streams sends the change as an event to lambda, which then updates the denormalised Latest records.
Note: One piece of information was missing from your excellent write-up: how often do you need to perform the
latest
queries? If not very often, then "scan-and-done" will be OK for your current volumes. If the answer is 1klatest
queries per minute, then it's a different story. The good news is that your basic table design is solid.Latest
query optimisation can be implemented incrementally when performance/cost problems arise, without messing with the table design.Denormalising the Latest Versions
We will keep denormalised copie(s) of the latest versions, another sinful-sounding DynamoDB pattern. The Stream-triggered lambda will update those records using the update API when a version is added or changes release status. How to store the latest version info? We have several options:
latest
data in a singleton record with map attributes{app1: {latest version copy}, app2: ...}
. You can put more logic into the records to handle theisReleased
items, or simply fetch the record and filter in your backend.app_id
. The records have the same info as in #1.GSI1PK=Latest#Released AND begins_with(GSI1SI, "IOS")
Note: If you have high query volumes and low cardinality, hot partitions may be a problem for these "leaderboard" type deormalised patterns. If this becomes a problem, you can address it by keeping multiple copies of each "latest" record, e.g. have X copies that are queried randomly
latest-copy1
,latest-copy2
,latest-copy3
. Amazon calls this pattern sharding using calculated suffixes.