glom assign based on data

429 Views Asked by At

In the following code, I am trying to mask personal information based on data. I have two scenarioes. In scenario 1, I want to update when type = 'FirstName', update or assign valueString value to "Masked". In scenario 2, I want to update when type matches the pattern "first****Name", update or assign valueString value to "Masked". I was wondering if anyone have suggestions for writing glom assign statements to solve the above cases.

Example Json String

{
"id": "985babac-9999-8888-8887",
"entity": [
{
"what": {
"reference": "4lincoln-123-11eb-bc1a-732f"
},
"detail": [
{
"type": "uuid",
"valueString": "4obama-f199-77eb-bc1a-555555704d2f"
},
{
"type": "firstName",
"valueString": "John"
},
{
"type": "userName",
"valueString": "Johns"
},
{
"type": "middleInitial",
"valueString": "S"
},
{
"type": "lastName",
"valueString": "Trump"
},
{
"type": "first-4fa999-f1999-Name",
"valueString": "John"
},
{
"type": "birth-4fa999-f1999-Date",
"valueString": "2010-01-01"
}
]
}
]
}

Updated output should look like the following

{
"id": "985babac-9999-8888-8887",
"entity": [
{
"what": {
"reference": "4lincoln-123-11eb-bc1a-732f"
},
"detail": [
{
"type": "uuid",
"valueString": "4obama-f199-77eb-bc1a-555555704d2f"
},
{
"type": "firstName",
"valueString": "Masked"
},
{
"type": "userName",
"valueString": "Johns"
},
{
"type": "middleInitial",
"valueString": "S"
},
{
"type": "lastName",
"valueString": "Trump"
},
{
"type": "first-4fa999-f1999-Name",
"valueString": "Masked"
},
{
"type": "birth-4fa999-f1999-Date",
"valueString": "2010-01-01"
}
]
}
]
}

     
1

There are 1 best solutions below

0
On

I came up with the following solution. I was wondering if this can be done in one glom call instead of calling multiple times?

import json
import logging
import sys
import time
import re
from glom import glom, assign, Coalesce, SKIP, Spec, Path, Call, T, Iter, Inspect

LOGGING_FORMAT = '%(asctime)s - [%(filename)s:%(name)s:%(lineno)d] - %(levelname)s - %(message)s'
LOGLEVEL = logging.INFO

logging.basicConfig(level=LOGLEVEL,format=LOGGING_FORMAT)
logger = logging.getLogger(__name__)


start_time = time.time()

target = {
"id": "985babac-9999-8888-8887",
"entity": [
{
"what": {
"reference": "4lincoln-123-11eb-bc1a-732f"
},
"detail": [
{
"type": "uuid",
"valueString": "4obama-f199-77eb-bc1a-555555704d2f"
},
{
"type": "firstName",
"valueString": "John"
},
{
"type": "userName",
"valueString": "Johns"
},
{
"type": "middleInitial",
"valueString": "S"
},
{
"type": "lastName",
"valueString": "Trump"
},
{
"type": "first-4fa999-f1999-Name",
"valueString": "John"
},
{
"type": "birth-4fa999-f1999-Date",
"valueString": "2010-01-01"
}
]
}
]
}

# def myupdate(x):
#     for count, item in enumerate(x):
#         myspec = 'entity.0.detail.{}.valueString'.format(count)
#         if item == 'firstName':
#             _ = assign(target,myspec,'Masked')

piiRegex = re.compile(r'^first.*Name$|^last.*Name$|^middle.*Initial$|^birth.*Date$')


def myupdate(x):
    for count, item in enumerate(x):
        myspec = 'entity.0.detail.{}.valueString'.format(count)
        mo = piiRegex.search(item)
        if mo:
            _ = assign(target,myspec,'Masked')
        

spec = {'result': ('entity.0.detail', ['type'], myupdate)}

xyz = glom(target, spec)
print(xyz)
print(target)




logger.info("Program completed in --- %s seconds ---" % (time.time() - start_time))

===============

Result:

{'result': None}
{'id': '985babac-9999-8888-8887', 'entity': [{'what': {'reference': '4lincoln-123-11eb-bc1a-732f'}, 'detail': [{'type': 'uuid', 'valueString': '4obama-f199-77eb-bc1a-555555704d2f'}, {'type': 'firstName', 'valueString': 'Masked'}, {'type': 'userName', 'valueString': 'Johns'}, {'type': 'middleInitial', 'valueString': 'Masked'}, {'type': 'lastName', 'valueString': 'Masked'}, {'type': 'first-4fa999-f1999-Name', 'valueString': 'Masked'}, {'type': 'birth-4fa999-f1999-Date', 'valueString': 'Masked'}]}]}