How to check if values of 'column1' are within +-20% range of values of 'column2' using Amazon Deequ?

877 Views Asked by At

So, I'm using Amazon Deequ in spark, and I have a dataframe 'df' with two columns being of type 'Long' or numeric. I simply want to check:

value(column1) lies between value(column2)-20% and value(column2)+20% for all rows

I'm not sure what check to put here:

val verificationResult: VerificationResult = { VerificationSuite()
  .onData(df)
  .addCheck(
    Check(CheckLevel.Error, "Review Check")
      //.funtionToCheckThis()
    )
  .run()
1

There are 1 best solutions below

4
On BEST ANSWER

Check has a method satisfies which can take a column expression as condition parameter.

To check whether column1 is between -20%column2 and +20%column2, you can use expression like:

|column1 - column2| < 0.20*column2

or column1 between 0.80*column2 and 1.20*column2:

val verificationResult: VerificationResult = {
  VerificationSuite()
    .onData(df)
    .addCheck(
      Check(CheckLevel.Error, "Review Check")
        .satisfies(
          "abs(column1 - column2) <= 0.20 * column2",
          "value(column1) lies between value(column2)-20% and value(column2)+20%"
        )
    ).run()
}