I'm designing an Event Store on AWS and I chose DynamoDB because it seemed the best option. My design seems to be quite good, but I'm facing some issues that I can't solve.
**The design
Events are uniquely identified by the pair (StreamId, EventId)
:
StreamId
: it's the same of the aggregateId, which means one Event Stream for one Aggregate.EventId
: an incremental number that helps keeping the ordering inside the same Event Stream
Events are persisted on DynamoDb. Each event maps to a single record in a table where the mandatory fields are StreamId, EventId, EventName, Payload (more fields can be added easily).
The partitionKey is the StreamId, the sortKey is the EventId.
Optimistic Locking is used while writing an event to an Event Stream. To achieve this, I'm using the DynamoDb conditional writes. If an event with the same (StreamId, EventId) already exists, I need to recompute the aggregate, recheck business conditions and finally write again if business conditions pass.
Event Streams
Each Event Stream is identified by the partitionKey. Query a stream for all events equals to query for partitionKey=${streamId} and sortKey between 0 and MAX_INT.
Each Event Stream identifies one and only one aggregate. This helps to handle concurrent writes on the same aggregate using optimistic locking as explained before. This also grants great performance while recomputing an aggregate.
Publication of events
Events are published exploiting the combination of DynamoDB Streams + Lambda.
Replay events
Here's where the issues start. Having each event stream mapped with only one aggregate (which leads to having a great number of event streams), there's no easy way to know which event streams from which I need to query for all events.
I was thinking of using an additional record, somewhere in DynamoDB that stores in an array all StreamIds. I can then query for it and start querying for the events, but if a new stream is created while I'm replaying, I'll lose it.
Am I missing something? Or, is my design simply wrong?
Not really; it's a Hard Problem[tm].
Your write use cases are typically only concerned with a single reference within the model -- the pointer to the current history of events. Your read use cases are often concerned with data distributed across multiple streams.
The way that this usually works is that your persistence store not only maintains the changes that have been written, but also an index that supports reads. For example, Eventide's postgres message store depends on the indexing that happens when you insert rows into a table. In the case of Event Store, the updates to the index are written as part of the same serialized "transaction" as the changes to the stream(s).
Another way of expressing the same idea: the queries are actually running at a coarser grain than the writes, with the storage appliance implicitly providing the coordination guarantees that you expect.
Take away the coordination, and you have something analogous to assigning a unique host to each stream.
It may be useful to look carefully at the Git object database and familiarize yourself with what's really happening in that store under the covers. I also found that Rich Hickey's talk The Language of the System provided useful concepts in distinguishing
values
fromnames
fromreferences
.Unless you have some compelling business reason to build your event store from the ground up, I'd encourage you to instead look at Aurora, and see how far you can get with that. It might buy you the time you need to wait for somebody else to put together a cost effective cloud native event store appliance for you.