For a simple key-value pair list JSON, use jq to print a summary by range of values

Question

For a simple key-value pair list JSON, use jq to print a summary by range of values

95 Views Asked by Jojo Thomas At 30 July 2023 at 10:56

Consider the following JSON having a list of key-value pairs

{
  "session1": 128,
  "session2": 1048596,
  "session3": 3145728,
  "session4": 3145828,
  "session5": 11534338,
  "session6": 11544336,
  "session7": 2097252
}

The key is a session identifier, and the value is the length of the value stored in the session.

I want to print counts of values by range - the ranges being (lower bound included, high bound excluded); 0-1MB, 1-2MB, 2-3MB, ... 12-13MB.

 1MB =  1048576
 2MB =  2097152
 3MB =  3145728
 4MB =  4194304
 5MB =  5242880
 6MB =  6291456
 7MB =  7340032
 8MB =  8388608
 9MB =  9437184
10MB = 10485760
11MB = 11534336
12MB = 12582912
13MB = 13631488

The expected output is

{
  "0-1MB": 1,
  "1-2MB": 1,
  "2-3MB": 1,
  "3-4MB": 2,
  "10-11MB": 2
}

The above is just representative, suggestions are welcome.

Original Q&A

There are 2 best solutions below

pmf On 30 July 2023 at 12:57

Here's an approach using reduce which simply iterates over the input values integer-divided by 1MB, and successively increments the according result field by one.

reduce (.[] / 1048576 | floor) as $k ({}; ."\($k)-\($k+1)MB" += 1)

{
  "0-1MB": 1,
  "1-2MB": 1,
  "3-4MB": 2,
  "11-12MB": 2,
  "2-3MB": 1
}

Demo

The stream of numbers iterated over can, of course, be sorted first to get an object with increasing field names:

reduce (map(.) | sort[] / 1048576 | floor) as $k ({}; ."\($k)-\($k+1)MB" += 1)

{
  "0-1MB": 1,
  "1-2MB": 1,
  "2-3MB": 1,
  "3-4MB": 2,
  "11-12MB": 2
}

Demo

**fizzie** · Accepted Answer · 2023-07-30T11:14:53.217000

The following should work:

to_entries
| map(.value / 1048576 | floor | [tostring, "-", (.+1 | tostring), "MB"] | add)
| group_by(.)
| map({"key": .[0], "value": length})
| from_entries

For your input, it produces the following output:

{
  "0-1MB": 1,
  "1-2MB": 1,
  "11-12MB": 2,
  "2-3MB": 1,
  "3-4MB": 2
}

(11534338 and 11544336 are counted in the "11-12MB" bucket rather than the "10-11MB" one, because 11*2^20 = 11534336, and those numbers are larger than that.)

If you wanted the keys in numeric order, you could also convert them to your preferred string labels after the group_by:

to_entries
| map(.value / 1048576 | floor)
| group_by(.)
| map({"key": [(.[0] | tostring), "-", (.[0]+1 | tostring), "MB"] | add, "value": length})
| from_entries

Which produces:

{
  "0-1MB": 1,
  "1-2MB": 1,
  "2-3MB": 1,
  "3-4MB": 2,
  "11-12MB": 2
}

Both solutions have the same basic steps:

Convert the input object to an array of {"key": x, "value": y} entries (to_entries).
Map the entries into something that identifies the range they're in, by rouding down to the nearest megabyte (.value / 1048576 | floor).
Group by the value (group_by). This produces an array like [[0], [1], [2], [3, 3], [11, 11]] for your input.
For each group, produce an entry where the "key" field is the range label ("X-YMB") and the "value" is the number of elements in the group (length).
Convert the list of entries back to a single object (from_entries).

For a simple key-value pair list JSON, use jq to print a summary by range of values

There are 2 best solutions below

Related Questions in JSON

Related Questions in JQ

Related Questions in FREQUENCY-DISTRIBUTION

Trending Questions

Popular # Hahtags

Popular Questions