I have a DynamoDB table with many attributes and a global_secondary_index defined on one of them. This table contains millions of items, and I need to do a query which on SQL would be something like: Select * from table where field1 is IN (value1,..., valuen). field1 is the indexed one.
Note:
- I have 10k value to check in the IN
- I don't want to scan the table, but use the global secondary index
So far I have not been able to find any easy way to do that, either via DynamoDB AWS console or via the AWS programmatic API.
I have noticed that PartiQL offers such syntax, and I am curious if it does a table scan under the hood or just fetch the items one by one. (Anyway, PartiQL is currently not available for me)
As a workaround, I did a script to return the items one by one.
I have read such post, which does not help here: Querying DynamoDB with a partition key and list of specific sort keys
Finally I would also be curious to understand possibly why such feature does not exist natively (what are the technical constraints if any)?
Thanks
The way PartQL would do the following
Is simply that behind the scenes it basically ends up as two queries to DDB.
I can't find any documented limit with a quick search, but I suspect that 10K values would be outside that limit.
The thing to understand is that a Query in DDB can only touch 1 partition. A "table scan" is broken into individual scans of each partition; sequentially by default but can be done in parallel
Also that a GSI is basically just a separate DDB table with slightly different rules (duplicate Primary Keys allowed) than the main table.
So in short, you need to write some code that would loop through the 10K values and run a query of the GSI for each value.