Data extraction for specific string

86 Views Asked by At

I have a long list of JSON data, with repeats of contents similar to followings.

Due to the original JSON file is too long, I will just shared the hyperlinks here. This is a result generated from a database called RegulomeDB.

Direct link to the JSON file

I would like to extract specific data (eQTLs) from "method": "eQTLs" and "value": "xxxx", and put them into 2 columns (tab delimited) exactly like below. Note: "value":"xxxx" is extracted right after "method": "eQTLs"is detected.

eQTLs   firstResult, secondResult, thirdResult, ...

In this example, the desired output is:

eQTLs   EIF3S8, EIF3CL

I've tried using a python script but was unsuccessful.

import json
with open('file.json') as f:
    f_json = json.load(f)
    print 'f_json[0]['"method": "eQTLs"'] + "\t" + f_json[0]["value"]

Thank you for your kind help.

2

There are 2 best solutions below

6
SaSkY On BEST ANSWER

Try this:

cat file.json | grep -iE '"method":\s*"eQTLs"[^}]*' -o | cut -d ',' -f 1,5 | sed -r 's/"|:|method|value//gi' | sed 's/\s*eqtls,\s*//gi' | tr '\n' ',' | sed 's/,$/\n/g' | sed 's/,/, /g' | xargs echo -e 'eQTLs\x09'
0
Reino On

Maybe you'll find the JSON-parser useful. It can open urls and can manipulate strings any way you want:

$ xidel -s "https://regulomedb.org/regulome-search/?regions=chr16:28539847-28539848&genome=GRCh37&format=json" \
  -e '"eQTLs	"||join($json("@graph")()[method="eQTLs"]/value,", ")'
eQTLs   EIF3S8, EIF3CL

Or with the XPath/XQuery 3.1 syntax:

-e '"eQTLs	"||join($json?"@graph"?*[method="eQTLs"]?value,", ")'