use of standard CommitLogReader for Change Data Capture?

38 Views Asked by At

These questions relate to implementing a specialized form of Change Data Capture (CDC) for Cassandra.

(1) Is there a brokerless, non-distributed, reliable queuing package for Java that is capable of encryption at rest and of supporting concurrent reads and writes? I don't see a suitable freeware package that avoids streaming. This seems to rule out JVM agents such as DataStax's "CDC Agent for Apache Cassandra" and standalone agents like Debezium.

(2) Is there a lighter-weight Maven package than cassandra-all that supports Apache Cassandra's standard CommitLogReader.class for CDC purposes?

(3) Does Apache Cassandra's standard CommitLogReader.class in the cassandra-all version 4.1.3 package from Maven have an interface that will read all mutations from the minimum position to a specified end index? Documentation for Cassandra's Change Data Capture (CDC) says that one should look for *.idx files in the cdc_raw directory, and go no further into the corresponding *.log file than the index written in the *.idx file. This avoids reading unflushed mutations.

There doesn't seem to be a function signature in the CommitLogReader.class in the cassandra-all version 4.1.3 package in Maven that is suitable for reading either a single mutation or all flushed mutations. In the former case, it is possible to read a single mutation, but it looks like the only way to get the new minimum position back to the caller of any of the CommitLogReader methods is to extract that information in the CommitLogReadHandler and make it available to the caller. (That also entails reading mutations from the beginning of the file until the minimum position has been reached.) In the latter case, it seems that all methods read to the end of the file, without regard to what has been flushed to disk as specified in the *.idx file. There's a note in the initial implementation that seemed aimed at supplying the missing functionality at a later time.

I'd prefer not to roll my own CommitLogReader, in part to ensure compatibility between the CDC consumer and the DB, and in part to have available the functionality that the standard implementation provides.

So, I'm inclined to loop calling the standard CommitLogReader one mutation at a time, as mentioned above, in a standalone service. This unfortunately requires mocking up a good part of the DB schema, and has the full footprint of the cassandra-all Maven package.

0

There are 0 best solutions below