KQL: deduplication, record priority and sorting

101 Views Asked by At

I'm new to Kusto and trying to figure out how to de-dupe in KQL. I have a table in Kusto that stores account information. For some of the accounts, the table has 2 rows: one with updated_c == true and another row with updated_c == false. I would like to write a KQL script that would survive the row with updated_c == true in these cases of duplicate accounts. In other words, I'm trying to de-dupe this dataset where when there are 2 rows per each account, it would survive the one with updated_c == true. And if there are only one row per perspective account, it would survive it regardless of updated_c vlaues.

account Id updated_c
12 true
12 false

would anyone be able to help me learn how to de-dupe this?

1

There are 1 best solutions below

0
On BEST ANSWER

This should do it.

datatable
| extend updated_c_int = toint(updated_c)
| summarize arg_max(updated_c_int, *) by accountId
| project accountId, updated_c
  • extend updated_c_int = toint(updated_c): Convert the boolean updated_c to an integer. true becomes 1 and false becomes 0.
  • summarize arg_max(updated_c_int, *) by accountId: For each accountId, get the row with the highest value of updated_c_int. If there's a tie, arg_max will return the first row it encounters.
  • project accountId, updated_c: Only select the columns accountId and updated_c in the final output.

Given your example input, this query should return:

accountId    updated_c
12           true