Regex to remove Key/Value in a JSON Object

156 Views Asked by At

I have a JSON like below:

{"queueNumber": "123","field":"name",UserId":[12,12,34],"cur":[{"objectName":"test","uniqueNumber":"123456"}]}

I want to remove the key-value pairs if it matches key or value matches the given field.

I am using the below regex.It is not removing the key if the type of value is an Array. ex: UserId":[12,12,34]

(,\s*"(queueNumber|name|uniqueNumber|cur|UserId)\d*":\s*(".*?"|\d+.\d+|\w+))|("(queueNumber|name|uniqueNumber|cur|UserId)\d*":\s*(".*?"|\d+.\d+|\w+)(\s*,)?)

current output:

{"UserId":[12,12,34],"cur":[{"objectName":"test"}]}

Expected output:

{"cur":[{"objectName":"test"}]}

It's quite obvious that regex is not an ideal way to get this. But, currently we have to fix this using regex. How to improve this regex to handle this case?

I have very minimal knowledge about regex. So, I am seeking out here for help.

Thanks in advance!

Note: I have to remove it using Regular Expressions and not using any other language. Please don't post any of those answers or mark this as a possible duplicate question.

2

There are 2 best solutions below

0
rzwitserloot On

The 'regular' in 'regular expression' isn't some random word. It refers specifically to the notion of a Regular Grammar. As in, Regular Expressions can only be used to parse things written in a Regular Grammar, and, crucially, JSON is not a Regular Grammar.

Hence, you cannot parse the with REs.

You were asked to. Okay. What if I ask you to break the speed of light? or make 2+2 equal to 5? The answer is simply: You were asked to do something that is impossible to do.

You can add a bunch of clauses on JSON to make it regular, but then, it would no longer be JSON. You could also write an RE-based JSON parser/modifier which simply does the wrong thing for certain inputs, but then you'd have an improper algorithm.

0
David G. On

As @rzwitserloot points out, the general case can't be done.

In your case, the specific case can be done. (At least, if your example is truly representative.)

Change (".*?"|\d+.\d+|\w+) to (".*?"|\[[\d,]+\]|\d+.\d+|\w+).

I would then go back and review the third and fourth clauses there. The third probably should have been \d+\.\d+, and might justify adding variant \d+. On the other hand, maybe you meant to also catch 1E10, but not 1.1E10. The fourth should only be matching select keywords (true, false, null), so you should probably just name them.