For a simple key-value pair list JSON, use jq to print a summary by range of values

95 Views Asked by At

Consider the following JSON having a list of key-value pairs

{
  "session1": 128,
  "session2": 1048596,
  "session3": 3145728,
  "session4": 3145828,
  "session5": 11534338,
  "session6": 11544336,
  "session7": 2097252
}

The key is a session identifier, and the value is the length of the value stored in the session.

I want to print counts of values by range - the ranges being (lower bound included, high bound excluded); 0-1MB, 1-2MB, 2-3MB, ... 12-13MB.

 1MB =  1048576
 2MB =  2097152
 3MB =  3145728
 4MB =  4194304
 5MB =  5242880
 6MB =  6291456
 7MB =  7340032
 8MB =  8388608
 9MB =  9437184
10MB = 10485760
11MB = 11534336
12MB = 12582912
13MB = 13631488

The expected output is

{
  "0-1MB": 1,
  "1-2MB": 1,
  "2-3MB": 1,
  "3-4MB": 2,
  "10-11MB": 2
}

The above is just representative, suggestions are welcome.

2

There are 2 best solutions below

1
fizzie On BEST ANSWER

The following should work:

to_entries
| map(.value / 1048576 | floor | [tostring, "-", (.+1 | tostring), "MB"] | add)
| group_by(.)
| map({"key": .[0], "value": length})
| from_entries

For your input, it produces the following output:

{
  "0-1MB": 1,
  "1-2MB": 1,
  "11-12MB": 2,
  "2-3MB": 1,
  "3-4MB": 2
}

(11534338 and 11544336 are counted in the "11-12MB" bucket rather than the "10-11MB" one, because 11*2^20 = 11534336, and those numbers are larger than that.)

If you wanted the keys in numeric order, you could also convert them to your preferred string labels after the group_by:

to_entries
| map(.value / 1048576 | floor)
| group_by(.)
| map({"key": [(.[0] | tostring), "-", (.[0]+1 | tostring), "MB"] | add, "value": length})
| from_entries

Which produces:

{
  "0-1MB": 1,
  "1-2MB": 1,
  "2-3MB": 1,
  "3-4MB": 2,
  "11-12MB": 2
}

Both solutions have the same basic steps:

  1. Convert the input object to an array of {"key": x, "value": y} entries (to_entries).
  2. Map the entries into something that identifies the range they're in, by rouding down to the nearest megabyte (.value / 1048576 | floor).
  3. Group by the value (group_by). This produces an array like [[0], [1], [2], [3, 3], [11, 11]] for your input.
  4. For each group, produce an entry where the "key" field is the range label ("X-YMB") and the "value" is the number of elements in the group (length).
  5. Convert the list of entries back to a single object (from_entries).
2
pmf On

Here's an approach using reduce which simply iterates over the input values integer-divided by 1MB, and successively increments the according result field by one.

reduce (.[] / 1048576 | floor) as $k ({}; ."\($k)-\($k+1)MB" += 1)
{
  "0-1MB": 1,
  "1-2MB": 1,
  "3-4MB": 2,
  "11-12MB": 2,
  "2-3MB": 1
}

Demo

The stream of numbers iterated over can, of course, be sorted first to get an object with increasing field names:

reduce (map(.) | sort[] / 1048576 | floor) as $k ({}; ."\($k)-\($k+1)MB" += 1)
{
  "0-1MB": 1,
  "1-2MB": 1,
  "2-3MB": 1,
  "3-4MB": 2,
  "11-12MB": 2
}

Demo