Using Recursive feature while Flattening in Snowflake

262 Views Asked by At

I have a JSON string, which needs to be parsed in order to retrieve particular values.Here is an example I am working with;

{
  "assignable_type": "SHIPMENT",
  "rule": {
    "rules": [
      {
        "meta_data": {},
        "rules": [
          {
            "op": "IN",
            "target": "CLIENT_FID",
            "type": "ARRAY_VALUE_ASSERTION",
            "values": [
              "flx::core:client:dbid/64171",
              "flx::core:client:dbid/76049",
              "flx::core:client:dbid/34040",
              "flx::core:client:dbid/61806"
            ]
          }
        ],
        "type": "AND"
      }
    ],
    "type": "OR"
  },
  "type": "USER_DEFINED"
}

The goal is to get the values when "target":"CLIENT_FID".

Expected Output for this JSON file should be ;

["flx::core:client:dbid/64171",
   "flx::core:client:dbid/76049",
   "flx::core:client:dbid/34040",
   "flx::core:client:dbid/61806"]
 

Here, as we can see rules is a list of dictionaries, and we can have nested lists as seen in the example.

Similarly, we have other JSON file of following type;

{
  "assignable_type": "SHIPMENT",
  "rule": {
    "rules": [
      {
        "meta_data": {},
        "rules": [
          {
            "op": "IN",
            "target": "PORT_OF_ENTRY_FID",
            "type": "ARRAY_VALUE_ASSERTION",
            "values": [
              "flx::core:port:dbid/566788",
              "flx::core:port:dbid/566931",
              "flx::core:port:dbid/561482"
            ]
          }
        ],
        "type": "AND"
      },
      {
        "meta_data": {},
        "rules": [
          {
            "op": "IN",
            "target": "PORT_OF_LOADING_FID",
            "type": "ARRAY_VALUE_ASSERTION",
            "values": [
              "flx::core:port:dbid/561465"
            ]
          },
          {
            "op": "IN",
            "target": "SHIPMENT_MODE",
            "type": "ARRAY_VALUE_ASSERTION",
            "values": [
              0
            ]
          },
          {
            "op": "IN",
            "target": "CLIENT_FID",
            "type": "ARRAY_VALUE_ASSERTION",
            "values": [
              "flx::core:client:dbid/28169"
            ]
          }
        ],
        "type": "AND"
      }
    ],
    "type": "OR"
  },
  "type": "USER_DEFINED"
}

For the second example ,

Expected Output shd be;

 ["flx::core:client:dbid/28169"]

As. seen, we may need to read the values at different depths in the file. In order to address this issue, I used following code;

/* first convert the string to a JSON object in cte1 */


with cte1 as (
select to_json(json_string) as json_rep,
      parse_json(json_extract_path_text(json_rep, 'rule.rules')) as list_elem

from table 1),

cte2 as (select split_array,
     json_extract_path_text(split_array, 'target') as target_client

 from (
select json_rep,
       list_elem, 
       t.value as split_array,
       typeof(split_array) as obj_type,
       index
     
from cte1,
     table(flatten(cte1.list_elem, recursive=>true)) as t) temp /* use recursive feature */

where split_array ilike '%"target":"client_fid"%' /* filter for those rows containing this string */ 
     and obj_type='OBJECT')

select 
        split_array,
        json_extract_path_text(split_array, 'values') as client_values
        
 from cte2
 where target_client='CLIENT_FID'; /* filter the rows where we have the dictionary containing client fid */ 

In order to address the issue of varying depth at which client_fid is found we're recursing while flattening the string into rows. The output which is obtained for both of above inputs is provided below,

For the first String we get the actual output in variable client_values as

["flx::core:client:dbid/64171",
           "flx::core:client:dbid/76049",
           "flx::core:client:dbid/34040",
           "flx::core:client:dbid/61806"]

Similarly, for the second string we get the actual output as

["flx::core:client:dbid/28169"]

As seen the code seems to be working in getting the correct output, but the way I filtered in the final query for target_client='CLIENT_FID'; it seems to be a very hacky way. Hence is it possible to get a better approach to resolve the issue of retrieving client fid values though the depth can vary in the given input.

Help is appreciated.

0

There are 0 best solutions below