I'm new to Kusto and trying to figure out how to de-dupe in KQL. I have a table in Kusto that stores account information. For some of the accounts, the table has 2 rows: one with updated_c == true and another row with updated_c == false. I would like to write a KQL script that would survive the row with updated_c == true in these cases of duplicate accounts. In other words, I'm trying to de-dupe this dataset where when there are 2 rows per each account, it would survive the one with updated_c == true. And if there are only one row per perspective account, it would survive it regardless of updated_c vlaues.
| account Id | updated_c |
|---|---|
| 12 | true |
| 12 | false |
would anyone be able to help me learn how to de-dupe this?
This should do it.
extend updated_c_int = toint(updated_c): Convert the booleanupdated_cto an integer.truebecomes 1 andfalsebecomes 0.summarize arg_max(updated_c_int, *) by accountId: For eachaccountId, get the row with the highest value ofupdated_c_int. If there's a tie,arg_maxwill return the first row it encounters.project accountId, updated_c: Only select the columnsaccountIdandupdated_cin the final output.Given your example input, this query should return: