IBM Watson NLC - Training with more than 20,000 text examples?

91 Views Asked by Gabriel At 01 July 2025 at 23:54

We're currently developing a system that would return an ICD10-CM code (A medical/diagnosis coding system) from a text input. Example

input 'Black Eye'
return 'H44 - Disorders of the globe'

Problem is, ICD10-CM has 70,000 to 100,000 codes, so it won't let me train the model after I uploaded all those text examples from .csv files.

Is using multiple models a solution or should I switch to Google's AutoML?

Original Q&A

There are 1 best solutions below

Anton On 18 November 2019 at 09:34 BEST ANSWER

If you have 70-100k codes or classes, you will not be able to train a useful model with only 20k examples. For comparison, the ImageNet dataset has 20k categories but also 14 million examples.

I do not know if ICD10-CM has broader categories, but if it does you could train a model to predict those.

Another option is to limit yourself to codes that occur at least 100 times in your examples and put all others in one class. This means you will have a lot of input for which you will not be able to return a code.

In any case I think using your model with only 20k examples for actual medical purposes would be dangerous.

IBM Watson NLC - Training with more than 20,000 text examples?

There are 1 best solutions below

Related Questions in MACHINE-LEARNING

Related Questions in NLP

Related Questions in IBM-WATSON

Related Questions in NL-CLASSIFIER

Trending Questions

Popular # Hahtags

Popular Questions