Ctakes is able to identify 'lung cancer' , 'basal cell cancer' and such - that is it gives a proper SNOMED UMLS identifier. But if the sentence contains colorectal cancer, it just return 'malignant neeplasm'
I have tried playing around with NeContextsSubPipe with different window size and using ContextDependentTokenizerAnnotator ; but Ctakes never identifies "colorectal" cancer.
// Load a simple token processing pipeline from another pipeline file
load DefaultTokenizerPipeline.piper
// Add non-core annotators
add ContextDependentTokenizerAnnotator
// The POSTagger has a -complex- startup, but it can create its own description to handle it
addDescription POSTagger
//addDescription LvgAnnotator
addDescription ThreadSafeLvg
add DefaultJCasTermAnnotator
// Add Named Entity Context Entity Attribute annotators
load NeContextsSubPipe.piper
// Collect discovered Entity information for post-run access
collectEntities
DiseaseDisorderMention': {'Malignant Neoplasms'} <- when I do Ctakes on "colorectal cancer"
{'DiseaseDisorderMention': {'Basal cell carcinoma', 'Malignant Neoplasms', 'Malignant neoplasm of prostate'} <- when I do it on " basal cell cancer or prostate cancer"