KQL: deduplication, record priority and sorting

139 Views Asked by SusanD At 21 October 2023 at 19:15

I'm new to Kusto and trying to figure out how to de-dupe in KQL. I have a table in Kusto that stores account information. For some of the accounts, the table has 2 rows: one with updated_c == true and another row with updated_c == false. I would like to write a KQL script that would survive the row with updated_c == true in these cases of duplicate accounts. In other words, I'm trying to de-dupe this dataset where when there are 2 rows per each account, it would survive the one with updated_c == true. And if there are only one row per perspective account, it would survive it regardless of updated_c vlaues.

account Id	updated_c
12	true
12	false

would anyone be able to help me learn how to de-dupe this?

Original Q&A

There are 1 best solutions below

Daredevil On 21 October 2023 at 20:54 BEST ANSWER

This should do it.

datatable
| extend updated_c_int = toint(updated_c)
| summarize arg_max(updated_c_int, *) by accountId
| project accountId, updated_c

extend updated_c_int = toint(updated_c): Convert the boolean updated_c to an integer. true becomes 1 and false becomes 0.
summarize arg_max(updated_c_int, *) by accountId: For each accountId, get the row with the highest value of updated_c_int. If there's a tie, arg_max will return the first row it encounters.
project accountId, updated_c: Only select the columns accountId and updated_c in the final output.

Given your example input, this query should return:

accountId    updated_c
12           true

KQL: deduplication, record priority and sorting

There are 1 best solutions below

Related Questions in SORTING

Related Questions in DUPLICATES

Related Questions in KQL

Related Questions in AZURE-DATA-EXPLORER

Related Questions in SUMMARIZE

Trending Questions

Popular # Hahtags

Popular Questions