Apache Samza uses RocksDB as the storage engine for local storage. This allows for stateful stream processing and here's a very good overview.
My use case:
- I have multiple streams of events that I wish to process taken from a system such as Apache Kafka.
- These events create state - the state I wish to track is based on previous messages received.
- I wish to generate new stream events based on the calculated state.
- The input stream events are highly connected and a graph such as OrientDB / Neo4J is the ideal medium for querying the data to create the new stream events.
My question:
Is it possible to use a non-KV store as the local storage for Samza? Has anyone ever done this with OrientDB / Neo4J and is anyone aware of an example?
I've been evaluating Samza and I'm by no means an expert, but I'd recommend you to read the official documentation, and even read through the source code—other than the fact that it's in Scala, it's remarkably approachable.
In this particular case, toward the bottom of the documentation's page on State Management you have this:
I actually read through the code for the default
StorageEngine
implementation about two weeks ago to gain a better sense of how it works. I definitely don't know enough to say much intelligently about it, but I can point you at it:The major implementation concerns seem to be: