Train a non-english Stanford NER models

545 Views Asked by Abdaoui Amine At 05 June 2025 at 00:48

I'm seeing several posts about training the Stanford NER for other languages.

eg: https://blog.sicara.com/train-ner-model-with-nltk-stanford-tagger-english-french-german-6d90573a9486

However, the Stanford CRF-Classifier uses some language dependent features (such as: Part Of Speechs tags).

Can we really train non-English models using the same Jar file? https://nlp.stanford.edu/software/crf-faq.html

Original Q&A

There are 2 best solutions below

O. Kaminska On 28 March 2019 at 13:33

I agree with previous comment that NER classification model is language independent.

If you have issue with training data I could suggest you this link with a huge amount of labeled datasets for different languages.

If you would like to try another model, I suggest ESTNLTK - library for Estonian language, but it could fit language independent ner models (documentation). Also, here you could find example how to train ner model using spaCy.

I hope it helps. Good luck!

Paprikamann On 10 October 2018 at 12:55

Training a NER classifier is language independent. You have to provide high quality training data and create meaningful features. The point is, that not all features are equally useful for every languages. Capitalization for instance, is a good indicator for a named entity in english. But in German all nouns are capitalized, which makes this features less useful.

In Stanford NER you can decide which features the classifier has to use and therefore you can disable POS tags (in fact, they are disabled by default). Of course, you could also provide your own POS tags in your desired language.

I hope I could clarify some things.

Train a non-english Stanford NER models

There are 2 best solutions below

Related Questions in STANFORD-NLP

Related Questions in NAMED-ENTITY-RECOGNITION

Related Questions in FRENCH

Trending Questions

Popular # Hahtags

Popular Questions