Upsolver Hive or Athena output have Upsert Partition Fields property. What does this do?

14 Views Asked by At

When we create Hive or Athena output in Upsolver, the properties show a Upsert Partition Fields. What does this property really do and should we set it to Yes or No?

1

There are 1 best solutions below

0
On

Our recommendation is to keep Yes as it improves overall performance.

This applies when your output is an Upsert Output and we recommend using the Upsert partition fields = Yes. This way processing is more efficient and also the historical record is maintained in the older partition. View would always give the most recent record. The catalog is automatically updated to point to the most recent record. Example, if Upsert key is userId and you get new event for same userId, it will only vin current partition (lets day date partition if you have partitioned by date) and update the catalog, historical record for same userId in older date partitions won't be touched. The underlying table will have all records, view will have the latest record.

With Upsert partition fields = No, eventually only most recent copy will be maintained (table/view will eventually be kind of alike) but processing is little less efficient as older records from older partitions will be removed.