Create json file for custom entity types in Watson Knowledge Studio

175 Views Asked by At

I am trying to upload a set of customized entity types and subtypes for a WKS instance.

Here is a view of WKS interface in the section where you can define entities and sub-entities.

Here is a view of WKS interface in the section where you can define entities and sub-entities

The upload button requests a json file.

I previously created a set by hand, and downloaded the json file.

The first rows of it are the following:

{"entityTypes":[{**"id":"78361798-b77e-4728-9b6a-f56539c12bcd"**,"label":"Calificativo","sireProp":{"mentionType":null,"subtypes":["Bueno_extremo","Bueno_moderado","Regular","Malo_moderado","Malo_extremo"],"roles":["78361798-b77e-4728-9b6a-f56539c12bcd"],"clazz":null,"color":null,"hotkey":null,"backGroundColor":null,"active":true,"roleOnly":false},"creationDate":1583241330349,"source":null,"modifiedDate":0,"typeType":null,"typeClass":null,"typeVersion":null,"typeDesc":null,"typeSuperType":null,"typeSuperTypeId":null,"typeCreateDate":null,"typeUpdateDate":null,"typeProvenance":null,"alchemyAPITypes":null,"nluAPITypes":null},{**"id":"daecb92b-0ce7-4a47-942a-68b50d0cb2fd"**,"label":"TV","sireProp":{"mentionType":null,"subtypes":["Decodificador","Servicio_de_tv"],"roles":

In general the content structure is clear, but there are IDs both for the set of entities and contents.

I wonder if there is a way to know in advance, or to generate these IDs, so I can generate the whole json with the types and subtypes I want to use, and later upload it.

I tried using "" in place of IDs, but got an error message and upload was not allowed.

2

There are 2 best solutions below

1
On BEST ANSWER

WKS does not support importing the customized json files which are different from what is exported from WKS workspace according to document. However, as far as I tried, UUID could be a valid id field value, generated by following bash command.

$ uuidgen | tr '[:upper:]' '[:lower:]'
0
On

This Python script generates a json file with a format that is understood by WKS:

import uuid
import json

# Generate IDs
ent_id,lbl01_id = uuid.uuid4(), uuid.uuid4()

json_out = {}
json_out.update({
                "entityTypes":[{
                    "id":str(lbl01_id), "label":"Calificativo",
                    "sireProp":
                        {
                        "mentionType":None,
                        "subtypes":["Bueno_extremo", "Bueno_moderado", "Regular", "Malo_moderado", "Malo_extremo"],
                        "roles":[str(lbl01_id)], "clazz":None, # Roles relates to self & other labels, if any
                        "color":None, "hotkey":None, "backGroundColor":None, "active":True, "roleOnly":False
                        },
                    "creationDate":1583241330349, "source":None, "modifiedDate":1583247016579, "typeType":None,
                    "typeClass":None, "typeVersion":None, "typeDesc":None, "typeSuperType":None, "typeSuperTypeId":None,
                    "typeCreateDate":None, "typeUpdateDate":None, "typeProvenance":None, "alchemyAPITypes":None,
                    "nluAPITypes":None
                    }],
                "sireInfo":{
                    "entityProp":{
                        "mentionType":[{"color":"white", "hotkey":"1", "backGroundColor":"#AA00FF", "name":"NAM"},
                                       {"color":"black", "hotkey":"2", "backGroundColor":"#00FF7F", "name":"NOM"},
                                       {"color":"black", "hotkey":"3", "backGroundColor":"#AAFFFF", "name":"PRO"},
                                       {"color":"white", "hotkey":"4", "backGroundColor":"gray", "name":"NONE"}],
                        "subtypes":None,
                        "roles":None,
                        "clazz":[{"color":"#A5A5A5", "hotkey":"3", "backGroundColor":"white", "name":"SPC"},
                                 {"color":"black", "hotkey":"2", "backGroundColor":"#00FF7F", "name":"NEG"},
                                 {"color":"black", "hotkey":"1", "backGroundColor":"#AAFFFF", "name":"GEN"}],
                        "color":None,
                        "hotkey":None,
                        "backGroundColor":None,
                        "active":True,
                        "roleOnly":False
                        },
                    "relationProp":{
                        "tense":[{"name":"PAST"}, {"name":"PRESENT"}, {"name":"FUTURE"}, {"name":"UNSPECIFIED"}],
                        "modality":[{"name":"ASSERTED"}, {"name":"OTHER"}],
                        "clazz":[{"name":"SPECIFIC"}, {"name":"NEG"}, {"name":"OTHER"}],
                        "backGroundColor":None, "color":None, "hotkey":None, "active":True}
                    },
                "functionalEntityTypes":[
                    {"id":"CATCH_ALL_ENTITY_ID", "label":"*",
                    "sireProp":{
                        "mentionType":None, "subtypes":None, "roles":None, "clazz":None, "color":None,
                        "hotkey":None, "backGroundColor":None, "active":True, "roleOnly":False},
                    "creationDate":1487227572757, "source":None, "modifiedDate":0, "typeType":None,
                    "typeClass":None, "typeVersion":None, "typeDesc":None, "typeSuperType":None,
                    "typeSuperTypeId":None, "typeCreateDate":None, "typeUpdateDate":None, "typeProvenance":None,
                    "alchemyAPITypes":None, "nluAPITypes":None
                    }],
                "pid":str(ent_id), "modified_date":1583247016579, "kgimported":False
                })

with open('json_file.json', 'w') as outfile:
    json.dump(json_out, outfile)

This generates only 1 entity; to generate more, just between "id" and "nluAPITypes", as many times as entities to be added.

Here also can be included "relationshipTypes"