I´m trying to extract some entities and relations from text documents using NLU and WKS. I got good results, but I would like to understand why Watson NLU do not recognize some entities of my custom model in similar documents, for example:
Text 1 in Portuguese: "Dá à causa o valor de R$ 10.000,00" => DIDN´T WORK
Text 2 in Portuguese: "Dá à causa o valor de R$ 20.000,00" => WORKED!
Text 3 in Portuguese: "Dá à causa o valor de R$ 10.000,01" => WORKED!
Watson recognize my entities and relations on Text 2 and Text 3 but do not in Text 1. The same thing happens with:
Text 4 in Portuguese: "Dá à causa o valor esperado de R$ 20.000,00" => DIDN´T WORK
Text 5 in Portuguese: "Dá à causa o valor de R$ 20.000,00" => WORKED!
A sample of document tagged:
Dataset:
- Training set: 250 documents (85%)
- Test set: 35 documents (12%)
Blind set: 10 documents (3%)
I already used anothers splits.
- All documents have the entities and relation, once by document, with variances.
I already tagged more documents with this scenario, but it didn´t improve the results. Another test was to tag any currency into the documents.
What can I do to improve the results?