Force Samza key/value store backed by RocksDB to reload from kafka changelog?

219 Views Asked by At

In order to debug a production problem, I am running Samza code locally using ProcessJobFactory. Everything appears to run fine.

The code uses a Samza key/value store backed by RocksDB and Kafka as a changelog (Kafka running on a different machine in case that matters).

In order to populate the environment with real data to debug, I replayed live data into the Kafka changelog for the key/value store for the RocksDB database with the Samza job stopped.

Upon starting Samza, it does not resync the RocksDB database with the Kafka changelog. I verified this using Keylord (tool) and looking at the contents of the RocksDB database directly.

How can Samza be forced to resync the RocksDB database (key/value store) with the changelog? Is there a config setting or a code level call that can be made?

Related - I assume when the code does a key-value-store.all(); that even if the cache in the code is empty, it will go to RocksDB and pull "all entries" from there?

Thanks,

1

There are 1 best solutions below

1
Rayman Preet On

Have you tried deleting the store directory where the samza job hosts its RocksDB stores? It'd be under the job.logged.store.base.dir you have configured https://samza.apache.org/learn/documentation/latest/jobs/configuration-table.html , defaulting to user.dir environment property