How can I make a efficient query to get all sorted by LastPostedDate threads from more than one group?

39 Views Asked by At

I am using Single Table in DynamoDB and Group and Thread are stored in one table. One-to-many-relation. All threads are sorted by lastPostedAt. lastPostedAt is updated if a new comment or a new suggestion is posted in Thread. PK=Group#{id} SK=Thread#{id}. it is no problem to get all threads in a group. GIS PK=GroupID SK=Thread#lastPostedAt

Access Pattern: Get all threads from more than one group. all threads are sorted by lastPostedAt.

How should I model for that access pattern with GIS?

Is it ok with large partitation key and Filterexpression? GIS PK=Thread (without id) SK=lastPostedAt Filterexpression=GroupID In ('Group1','Group2','Group3')

With filter expression, there can be difficulties in pagination if 'Group4' has a large number of elements compared to 'Group1', 'Group2' and 'Group3'.

If a group has a much larger number of elements than the other groups, this can lead to imbalances in pagination. If, for example, 'Group4' contains 1000 elements and 'Group1', 'Group2' and 'Group3' contain only 100 elements, it can be difficult to achieve a even distribution of the pagination across all groups.

schema.graphql

type Group
{
    id: ID!
    users: [User]
    admins: [User]
    title: String
}
type Thread
{
    id: ID!
    groupId: ID!
    term: String
    owner: user
    lastPostedAt: AWSDateTime
}
2

There are 2 best solutions below

0
Tofig Hasanov On BEST ANSWER

I think the right answer here is that DynamoDB is not a good choice for this use case. Consider using a different DB technology or a secondary storage that builds your sorted view.

That being said, using a static hash key “Thread” is a possible option but would introduce scaling limits, as a single partition in DybamoDB can generally support only up to 3000 RSU and 1000 WSU per second. If you are sure that you’ll never hit that limit then go ahead.

Keep in mind that filtering on groups will not reduce amount of records that will be processed by DynamoDB during query. It only impacts returned result, but you will still be charged for all records (and performance will also suffer)

2
Leeroy Hannigan On

Option 1

Use GroupId as the GSI partition key and lastPostedAt as sort key. Make multiple requests for each group you need, sorting the results on the client side.

Option 2

Use a global sort order. Define a static value as partition key, and lastPostedAt as sort key. This will order all the items in the table by lastPostedAt but means you would filter on group's that you need.

gsipk lastPostedAt group
1 2023-07-01T00:00:000 1
1 2023-07-01T01:00:000 4
1 2023-07-01T02:00:000 5
1 2023-07-01T03:00:000 1
1 2023-07-02T00:00:000 4
1 2023-07-02T11:00:000 7