Apache Samza flush table update to changelog immediately

96 Views Asked by At

If I specify a changelog backing for a RocksDB Table in Samza. Is there configuration to update the async write time to the changelog? I want to reduce it to a shorter time. I cannot see anything in the Config reference.

The scenario I want is too write to a changelog from a stream after bridging a legacy JMS connection. This legacy connection provides partial updates and I want to merge the partial updates into a fuller message building a cache of these messages in the samza streaming application and write these down to a changelog.

If I use a changelog configured with stores.store-name.changelog then it will write to the changelog eventually changes I make to the Samze API Table. But not quick enough for my needs so want to configure the max wait time to propagate to changelog.

Alternatively it seems that using the withSideInputs to bootstrap my table each time and then using sendTo will work faster to update and I can keep a LocalStore to read and write the cache too and always have the changelog as golden source.

The reason I want the changelog to write quickly too is because other applications are reading from this changelog.

1

There are 1 best solutions below

0
On BEST ANSWER

Yes you can configure the time it will commit changes to the changelog usin the config:

task.commit.ms

Docs

Then writes to the store will be flushed when the commit happens:

profileTable.put(message.key, message.value) 

A note on this higher volumes of input appear to result in changes going to changelog topic before this commit millisecond configuration. Also be careful not to put too low as will slow down overall throughout massively with higher volumes.

You can also use the low level API to commit on a particular stream task the TaskCoordinator provides commit api to manually commit.