Low score with currency Entities to Entity Extraction in IBM Watson NLU

120 Views Asked by At

I´m trying to extract some entities and relations from text documents using NLU and WKS. I got good results, but I would like to understand why Watson NLU do not recognize some entities of my custom model in similar documents, for example:

Text 1 in Portuguese: "Dá à causa o valor de R$ 10.000,00" => DIDN´T WORK

Text 2 in Portuguese: "Dá à causa o valor de R$ 20.000,00" => WORKED!

Text 3 in Portuguese: "Dá à causa o valor de R$ 10.000,01" => WORKED!

Watson recognize my entities and relations on Text 2 and Text 3 but do not in Text 1. The same thing happens with:

Text 4 in Portuguese: "Dá à causa o valor esperado de R$ 20.000,00" => DIDN´T WORK

Text 5 in Portuguese: "Dá à causa o valor de R$ 20.000,00" => WORKED!

A sample of document tagged:

enter image description here

Dataset:

  • Training set: 250 documents (85%)
  • Test set: 35 documents (12%)
  • Blind set: 10 documents (3%)

  • I already used anothers splits.

  • All documents have the entities and relation, once by document, with variances.

I already tagged more documents with this scenario, but it didn´t improve the results. Another test was to tag any currency into the documents.

What can I do to improve the results?

0

There are 0 best solutions below