I have a dynamodb table that contains a column/attributes with special letter characters, e.g. é, ë and so on.
Is there a way to find the entries with just the "normal" letters (e.g. e).
scancommand looks like this
input = {
"TableName": tableName,
"FilterExpression": "contains(#name, :term)",
"ExpressionAttributeNames": {
"#name":"name",
},
"ExpressionAttributeValues": {
':term': event.pathParameters.term,
}
};
and should for example find the name "Zoe", even when the table contains "Zoë".
No, DynamoDB has a fairly limited selection of functions in
FilterExpression, and one of them -contains()only supports exact substring matches.One commonly used solution is to denormalize the data, add another copy of the string which is more suitable for the search you need to do: In this case in addition to the "name" attribute you add to each item, add a new attribute, say "name_canonical" where if name was
Zoë, name_canonical will be set tozoe. You can then do a filter on the name_canonical attribute. Obviously, each time you write an item or modify the name in it, you'll also need to update name_canonical accordingly.Finally I want to comment that what you are asking about doing is very inefficient: You are basically scanning the entire database, and doing string matching on everything. You'll be paying Amazon the cost of reading your entire database. A much more efficient solution would be a have the name_canonical as a key of the table (either of the original table or a GSI), and then instead of
FilterExpressionyou'd useKeyConditionExpression- which will find just the data with the requested name, immediately, without paying to read the entire database.If you'll use
KeyConditionExpressionlike I recommend, you'll see you won't be able to usecontains()any more. You'll need to take the denormalization yet another step: For example, ifnameisZoë Smith, you can create multiplename_canonicalentries, one underzoe, one undersmith, so that searching for the keyzoewill find you the entries you want.Search engines do what I just described - splitting a string into separate words in their canonical forms - a process often called "lemmatization" - as part of their usual interface. But DynamoDB is not a search engine, so you need to do this on your own, or, like Leeroy said, integrate with yet another AWS product, a search engine (they recently announced a way to do this conveniently - see https://aws.amazon.com/blogs/aws/amazon-dynamodb-zero-etl-integration-with-amazon-opensearch-service-is-now-generally-available/).