How do I remove the latest transaction in a Foundry incremental build/transform?

457 Views Asked by At

I have an incremental dataset and would like to remove the last transaction. Below I attached a screenshot and added a border to the one I like removed. I want to remove it while preserving the dataset's "incrementality".

issues_incremental_basic

1

There are 1 best solutions below

1
On BEST ANSWER

Yes, it is possible to delete one or several transactions in your current dataset which is incrementally built without breaking its incrementality.

The only way to delete a transaction is to use Foundry API calls. If you are not familiar with APIs, please find here the guidelines and we would strongly recommend you trying instructions on a test dataset first until you are comfortable with the process.

The options available depend on your downstream datasets:

SCENARIO 1: Your downstream datasets are running incrementally

You can roll-back your dataset to the latest successful transaction by using the API in foundry's Catalog API "updateBranch2" (branchesUpdate2) please find additional information in this StackOverflow Thread:

    curl -X POST \ 
    -H "Authorization: Bearer $TOKEN" \ 
    -H "Content-Type: application/json" \ 
    “https://$HOSTNAME/foundry- 
    catalog/api/catalog/datasets/$DATASET_RID/branchesUpdate2/master" \ 
    -d '"TRANSACTION_RID"'

The result is that your downstream datasets will continue to run incrementally.

SCENARIO 2: If your downstream datasets are NOT running incrementally

You can remove specific files.

The lifecycle of a transaction is as follows:

  1. Start a new transaction setting the transaction type and the instructions of what you want the transaction to do
  2. If you are not satisfied, you can abort the transaction. When you are happy with what it will do, you can commit the transaction (this is the point of no return)

Therefore, for deleting specific files, you will have to use the following steps:

  1. Use create transaction with a transaction type of DELETE

      curl -X POST \
      -H "Content-type: application/json" \
      -H "Authorization: Bearer $TOKEN" \
      “https://$HOSTNAME/api/v1/datasets/$DATASET_RID/transactions" \
      -d '{"transactionType":"DELETE"}'
    

    <DATASET_RID> you can find the Dataset RID in your URL.

    ex. ri.foundry.main.dataset.c26f11c8-cdb3-4f44-9f5d-9816ea1c82da

  2. Add files to Delete Transaction by listing and opening the logical paths of the files to delete

You can get the filepaths from the dataset Details tab under Files

ex: spark/part-00000-d5e90287-22bd-4840-a6a0-6eb1d98d0af3-c000.snappy.parquet

      curl -X POST \
      -H "Content-type: application/json" \
      -H "Authorization: Bearer $TOKEN" \
      “https://$HOSTNAME/foundry-catalog/api/catalog/datasets/$DATASET_RID/transactions/$TRANSACTION_RID/files/open/$FILEPATH'

<TRANSACTION_RID> the has been sent as a response body of the first API call

  1. Commit your transaction

    curl -X POST
    -H "Content-type: application/json"
    -H "Authorization: Bearer $TOKEN"
    "https://$HOSTNAME/api/v1/datasets/$DATASET_RID/transactions/$TRANSACTION_RID/commit“

At any time, you can abortTransaction or get the files currently in your transaction with getFilesInTransactionPaged2.

Committing a DELETE transaction does not delete the underlying file from the backing file system—it simply removes the file reference from the dataset view.

DELETE transactions are breaking incrementality. Therefore, if this dataset is used on downstream incremental datasets, this action will break incrementality of their builds.