Amazon SimpleDB & DynamoDB for storing blog posts

902 Views Asked by At

Consider a simple blog post schema has the following columns

ID 
Author 
Category 
Status 
CreatedDateTime
UpdatedDateTime

So assume the following queries

  • query by ID
  • query by Author, paginated
  • query by (Author, Status), sorted by CreatedDateTime, paginated
  • query by (Category, Status), sorted by CreatedDateTime, paginated

So seems without doing much works, SimpleDB would be more easy to implement the codes?

2

There are 2 best solutions below

1
E.J. Brennan On

SimpleDB is barely supported by AWS any more - you can't even find it in the AWS console, so while it may work for you, personally I would be deciding between DynamoDB and DocumentDB (assuming you want NoSQL) - don't think there is any reason to start a new project on such an old offering at this point.

3
Matthew Pope On

You should use DynamoDB because it has a lot of useful features such as Point in Time Recovery, transactions, encryption-at-rest, and activity streams that SimpleDB does not have.

If you're operating on a small scale, DynamoDB has the advantage that it allows you to set a maximum capacity for your table, which means you can make sure you stay in the free tier.

If you're operating at a larger scale, DynamoDB automatically handles all of the partitioning of your data (and has, for all practical purposes, limitless capacity), whereas SimpleDB has a limit of 10 GB per domain (aka "table") and you are required to manage any horizontal partitioning across domains that you might need.

Finally, there are signs that SimpleDB is already on a deprecation path. For example, if you look at the SimpleDB release notes, you will see that the last update was in 2011, whereas DynamoDB had several new features announced at the last re:Invent conference. Also, there are a number of reddit posts (such as here, here, and here) where the general consensus is that SimpleDB is already deprecated, and in some of the threads, Jeff Barr even commented and did not contradict any of the assertions that SimpleDB is deprecated.


That being said, in DynamoDB, you can support your desired queries. You will need two Global Secondary Indexes, which use a composite sort key. Your queries can be supported with the following schema:

  • ID — hash key of your table
  • Author — hash key of the Author-Status-CreatedDateTime-index
  • Category — hash key of the Category-Status-CreatedDateTime-index
  • Status
  • CreatedDateTime
  • UpdatedDateTime
  • Status-CreatedDateTime — sort key of Author-Status-CreatedDateTime-index and Category-Status-CreatedDateTime-index. This is a composite attribute that exists to enable some of your queries. It is simply the value of Status with a separator character (I'll assume it's # for the rest of this answer), and CreatedDateTime appended to the end. (Personal opinion here: use ISO-8601 timestamps instead of unix timestamps. It will make troubleshooting a lot easier.)

Using this schema, you can satisfy all of your queries.

query by ID: Simply perform a GetItem request on the main table using the blog post Id.

query by Author, paginated: Perform a Query on the Author-Status-CreatedDateTime-index with a key condition expression of Author = :author.

query by (Author, Status), sorted by CreatedDateTime, paginated: Perform a Query on the Author-Status-CreatedDateTime-index with a key condition expression of Author = :author and begins_with(Status-CreatedDateTime, :status). The results will be returned in order of ascending CreatedDateTime.

query by (Category, Status), sorted by CreatedDateTime, paginated: Perform a Query on the Category-Status-CreatedDateTime-index with a key condition expression of Author = :author and begins_with(Status-CreatedDateTime, :status). The results will be returned in order of ascending CreatedDateTime. (Additionally, if you wanted to get all the blog posts in the "technology" category that have the status published and were created in 2019, you could use a key condition expression of Category = "technology" and begins_with(Status-CreatedDateTime, "published#2019").

The sort order of the results can be controlled using the ScanIndexForward field of the Query request. The default is true (sort ascending); but by setting it to false DynamoDB will return results in descending order.

DynamoDB has built in support for paginating the results of a Query operation. Basically, any time that there are more results that were not returned, the query response will contain a lastEvaluatedKey which you can pass into your next query request to pick up where you left off. (See Query Pagination for more details about how it works.)


On the other hand, if you're already familiar with SQL, and you want to make this as easy for yourself as possible, consider just using the Aurora Serverless Data API.