I'm currently developing a POC dataflow to investigate the capabilities of change data capture in the analytical store (simple pipeline with a dataflow with a CosmosDB analytical store source and a Parquet sink)
It seems that successive runs of the pipeline just pick up the changes since the last run.
In general this is of course the behaviour I want but where is this state stored and how can I reset it?
During development it is useful for me to be able to re-run the pipeline and experiment with different sinks without needing to make additional changes to the underlying cosmos data and when in production I imagine there may occasionally be cases where pipelines will need to be run and re-process some data.
I was anticipating that I might see a "Checkpoint Key" property in the dataflow settings similar to this SAP CDC example but I see no such option.