How is the state managed for CDC in Azure Cosmos DB analytical store?

127 Views Asked by At

I'm currently developing a POC dataflow to investigate the capabilities of change data capture in the analytical store (simple pipeline with a dataflow with a CosmosDB analytical store source and a Parquet sink)

It seems that successive runs of the pipeline just pick up the changes since the last run.

In general this is of course the behaviour I want but where is this state stored and how can I reset it?

During development it is useful for me to be able to re-run the pipeline and experiment with different sinks without needing to make additional changes to the underlying cosmos data and when in production I imagine there may occasionally be cases where pipelines will need to be run and re-process some data.

I was anticipating that I might see a "Checkpoint Key" property in the dataflow settings similar to this SAP CDC example but I see no such option.

0

There are 0 best solutions below