Use final user feedback to improve performance of Knowledge Studio model

79 Views Asked by At

I have an already trained Knowledge Studio model that is working. I've deployed it in a Natural Language Understanding service. The Entities and Relations given from NLU are not always precise, so I'm trying to enable the final user to correct errors in extracted information and improve the model with his feedback.

As an already trained model can be exported to a new instance of WKS, with it's content (sentences, words and annotated related entities and relations) being structured in a JSON format easily understandable; I'd like to know if it's possible to follow the same structure to tag new document text and upload them to WKS to reflect this user feedback, and hopefully improve the model.

1

There are 1 best solutions below

0
On

Well, I've found the answer by trying it. I downloaded the corpus from Knowledge Studio and analyzed the structure of the JSONs of each file (inside folder "./gt").

At the ending of each file, there are JSON entries for each entity previously annotated, so I used them as an example. For each entry, there's an id which has one value for the sentence number, and other for the mention number (both consecutive, starting from zero). The mention number restarts for every sentence, with each sentence being separated (at least as I could notice), by "\n", and also by ". " (note the space after "."). Also, each entry has a value for the character number at the beginning and at the end of the mention. When counting characters, the system does not take into account the "\" character. Here's an example of how it looks like.

{
"id" : "s3-m0", //id for the first mention in the fourth sentence
"properties" : {
  "SIRE_MENTION_TYPE" : "NONE",
  "SIRE_MENTION_CLASS" : "SPC",
  "SIRE_ENTITY_SUBTYPE" : "NONE",
  "SIRE_MENTION_ROLE" : "TEST_ENTITY"  // mention name
},
"type" : "TEST_ENTITY",  // mention name again
"begin" : 11, // beginning of the mention
"end" : 19,  // end of the mention
"inCoref" : false
}

If you are tagging a new mention (not previously included in the type system), you'll have to create it manually first. After adding this entry to each JSON, upload the modified corpus to Knowledge Studio, and create an annotation set with uploaded documents. Then, create a new task to annotate that new set, and you should see that the document is already annotated with the entries you added manually. So, the model is ready to be trained with these new examples, after submitting the documents and accepting the task. I think it should be similar for manually annotating relations.

Hope this helps someone else!