How do I update part of a dataset without doing a snapshot build of the whole dataset?

239 Views Asked by At

We have datasets partitioned on date, with a history going back some arbitrary amount of time.

If we need to apply updates to a particular day's data, we'd ideally want to replace just that day's data with new data, and leave data for all other days unchanged.

In Spark, this seems to be possible with partitionOverwriteMode (see Overwrite specific partitions in spark dataframe write method)

In the Foundry documentation of Snapshot vs. Incremental builds, there is no mention of updating datasets - it seems to only address appending to datasets via Incremental.

0

There are 0 best solutions below