Unstack a big data table Kusto by timestamp and category

314 Views Asked by At

I am working with a big dataset on ADX, where I need to unstack rows of data and convert them into columns. The uniqueID in the data is formed by combining three fields: a group, a timestamp and a name.

  datatable(Group:string, timestamp:datetime, channel_name:string , value:long)
[
    "A", datetime(2019-05-01 00:00:01), "channel_1", 12,
    "A", datetime(2019-05-01 00:00:02), "channel_1",  14,
    "A", datetime(2019-05-01 00:00:03),  "channel_1",  16,
    "A", datetime(2019-05-01 00:00:01), "channel_2", 12,
    "A", datetime(2019-05-01 00:00:02), "channel_2",  14,
    "A", datetime(2019-05-01 00:00:01),  "channel_3",  16,
    "B", datetime(2019-04-01 00:00:01),  "channel_1", 3,
    "B", datetime(2019-04-01 00:00:04),  "channel_1", 5,
    "B", datetime(2019-04-01 00:00:07),  "channel_2", 1,
    "B", datetime(2019-04-01 00:00:10),  "channel_3", 8,
] 
    

The expected output for one group (result can be with or without group column -- since group filter will always be applied)

group, timestamp, channel_1, channel_2, channel_3
    "A", datetime(2019-05-01 00:00:01), 12, 12, NULL,
    "A", datetime(2019-05-01 00:00:02), 14,14, NULL,
    "A", datetime(2019-05-01 00:00:03), 16, NULL,NULL,  

Tried running this following query (based on), but this does not unstack the columns as expected. This returned data in same format as above.

| where timestamp > datetime(2019-04-01) and timestamp <datetime(2019-04-03) \\ filter1 Always applied
| where machine_name =='A' \\filter2 Always Applied

| where channel_name  in  ("channel1, channel2, channel3")
| summarize value=sum(value) by channel_name, timestamp
1

There are 1 best solutions below

4
On BEST ANSWER

you could try this:

datatable(group:string, timestamp:datetime, channel_name:string , value:long)
[
    "A", datetime(2019-05-01 00:00:01), "channel_1", 12,
    "A", datetime(2019-05-01 00:00:02), "channel_1", 14,
    "A", datetime(2019-05-01 00:00:03), "channel_1", 16,
    "A", datetime(2019-05-01 00:00:01), "channel_2", 12,
    "A", datetime(2019-05-01 00:00:02), "channel_2", 14,
    "A", datetime(2019-05-01 00:00:01), "channel_3", 16,
    "B", datetime(2019-04-01 00:00:01), "channel_1", 3,
    "B", datetime(2019-04-01 00:00:04), "channel_1", 5,
    "B", datetime(2019-04-01 00:00:07), "channel_2", 1,
    "B", datetime(2019-04-01 00:00:10), "channel_3", 8,
]
| where group == "A"
| summarize b = make_bag(pack(channel_name, value)) by timestamp
| project timestamp, channel_1 = tolong(b.channel_1), channel_2 = tolong(b.channel_2), channel_3 = tolong(b.channel_3)

or this (less efficient due to usage of bag_unpack()):

datatable(group:string, timestamp:datetime, channel_name:string , value:long)
[
    "A", datetime(2019-05-01 00:00:01), "channel_1", 12,
    "A", datetime(2019-05-01 00:00:02), "channel_1", 14,
    "A", datetime(2019-05-01 00:00:03), "channel_1", 16,
    "A", datetime(2019-05-01 00:00:01), "channel_2", 12,
    "A", datetime(2019-05-01 00:00:02), "channel_2", 14,
    "A", datetime(2019-05-01 00:00:01), "channel_3", 16,
    "B", datetime(2019-04-01 00:00:01), "channel_1", 3,
    "B", datetime(2019-04-01 00:00:04), "channel_1", 5,
    "B", datetime(2019-04-01 00:00:07), "channel_2", 1,
    "B", datetime(2019-04-01 00:00:10), "channel_3", 8,
]
| where group == "A"
| summarize b = make_bag(pack(channel_name, value)) by timestamp
| evaluate bag_unpack(b)

both output this table:

| timestamp                   | channel_1 | channel_2 | channel_3 |
|-----------------------------|-----------|-----------|-----------|
| 2019-05-01 00:00:01.0000000 | 12        | 12        | 16        |
| 2019-05-01 00:00:02.0000000 | 14        | 14        |           |
| 2019-05-01 00:00:03.0000000 | 16        |           |           |