How can I filter for multiple object identifiers using the equivalent of wildcards or alternation in jq?

263 Views Asked by At

Summary

I have a working jq filter which correctly parses three different name and payload objects and massages them into the desired output format. The problem is that I'm having to express each object path explicitly, as I can't see to find a way to express alternation within the object identifiers.

I would like the filter to be more flexible so that it can show the data structure from containers down to any package manager with a title that starts with Packages. I need the whole structure, though, and not just the terminal nodes.

What I think I need is to express my object identifiers with alternation or wildcards, such as:

  • .capabilities.*.payload?
  • .capabilities.([apk, dpkg, rpm]).payload?

I realize neither of the above is valid jq syntax, which is the reason for the question. I have included a test corpus with valid JSON immediately below, and my current jq filter is listed in the section below that.

Minimal JSON File

This is my test corpus, stored as minimal.json in the current directory.

{
  "containers": {
    "3dc76c82e566a116e5b64bc91a0b6220c71db7052f68317ebbe90521db55bf36": {
      "container_name": "/apache-46869",
      "capabilities": {
        "apk": {
          "title": "Packages (APK)"
        },
        "dpkg": {
          "title": "Packages (DPKG)",
          "payload": {
            "apt": "1.0.9.8.4",
            "libnghttp2-14": "1.18.1-1"
          }
        },
        "rpm": {
          "title": "Packages (RPM)"
        }
      }
    },
    "474047a1fe238e39fa1917aff0c93154624bbf159d321d49d5e685302589ab51": {
      "container_name": "/nginx-alpine-46869",
      "capabilities": {
        "apk": {
          "title": "Packages (APK)",
          "payload": {
            ".nginx-rundeps": "0",
            "apk-tools": "2.6.8-r2"
          }
        },
        "dpkg": {
          "title": "Packages (DPKG)"
        },
        "rpm": {
          "title": "Packages (RPM)"
        }
      }
    },
    "d7dcd90791240d78022941cf054a6b474f5329acd79aa15b58dc342f95a8ce33": {
      "container_name": "/apache-alpine-46869",
      "capabilities": {
        "apk": {
          "title": "Packages (APK)",
          "payload": {
            ".httpd-rundeps": "0",
            "apk-tools": "2.6.8-r2",
            "apr": "1.5.2-r1",
            "apr-util": "1.5.4-r2"
          }
        },
        "dpkg": {
          "title": "Packages (DPKG)"
        },
        "rpm": {
          "title": "Packages (RPM)"
        }
      }
    }
  }
}

Explicit jq Filter

This is my current filter, which works but explicitly defines each optional object indentifier-index.

jq '
    [ .containers[] | { 
        name: .container_name, package_inventory: {
            apk: .capabilities.apk.payload?,
            dpkg: .capabilities.dpkg.payload?,
            rpm: .capabilities.rpm.payload?
        }   
    }]  
' minimal.json

Expected Output

My current output (shown below) is correct. The goal isn't to fix the output, but rather to make the filter more flexible.

[
  {
    "name": "/apache-46869",
    "package_inventory": {
      "apk": null,
      "dpkg": {
        "apt": "1.0.9.8.4",
        "libnghttp2-14": "1.18.1-1"
      },
      "rpm": null
    }
  },
  {
    "name": "/nginx-alpine-46869",
    "package_inventory": {
      "apk": {
        ".nginx-rundeps": "0",
        "apk-tools": "2.6.8-r2"
      },
      "dpkg": null,
      "rpm": null
    }
  },
  {
    "name": "/apache-alpine-46869",
    "package_inventory": {
      "apk": {
        ".httpd-rundeps": "0",
        "apk-tools": "2.6.8-r2",
        "apr": "1.5.2-r1",
        "apr-util": "1.5.4-r2"
      },
      "dpkg": null,
      "rpm": null
    }
  }
]
2

There are 2 best solutions below

2
peak On

The trick is to define a helper function. If, for example, you write:

def payloads(keys): . as $in
  | reduce keys[] as $key ({}; .[$key] = ($in|.[$key].payload?) );

then your query becomes:

.containers[] | { 
    name: .container_name,
    package_inventory: (.capabilities | payloads( ["apk","dpkg","rpm"] ))
}

Other variants of course are also possible. For example, you could define payloads as an arity-2 function, and thereby pass in "capabilities".

Using a JSON object to specify the keys

Here is a variant of payloads/1 illustrating (a) how to avoid reduce, and (b) how the keys can be specified by giving a JSON object as template:

def payloads_at(object):
  . as $in
  | object as $object
  | ({}
     | [($object|keys_unsorted[]) as $key
        | .[$key] = ($in|.[$key].payload?) ])
  | add;

This could be called like so: payloads_at( {apk, dpkg, rpm} ), or if you want the keys to be dynamically determined:

(.capabilities | payloads_at( . ) )
0
peak On

This helper function is perhaps closer to what you're looking for:

def star(pre; template; post):
  pre as $object
  | ({} | [($object|template|keys_unsorted[]) as $key | .[$key] = ($object | .[$key] | post) ])
  | add;

Usage

Explicit list of key names:

star(.capabilities; {apk,dpkg,rpm}; .payload)

Keys of .capabilities:

star(.capabilities; .; .payload)

Example:

.containers[] | { 
    name: .container_name,
    package_inventory: star(.capabilities; .; .payload)
}