compare NER library from Stanford coreNLP, SpaCy And Google cloud

4.6k Views Asked by At

I want to recognise person name from text. But i'm getting confused which NLP library I have to use for NER. I find out following best NLP library for NER 1. Stanford coreNLP 2. Spacy 3. Google cloud.

I unable to find out which library will give more accurate result and good performance. Please help me here.

2

There are 2 best solutions below

1
On BEST ANSWER

spaCy have the industrial-strength in terms of NLP and obviously faster and accurate in terms of NER. It is also bundled with multi-lingual models. Check spaCy

Also AllenNLP comes with state-of-the-art NER model but slightly complex to use. Check AllenNLP demo

If paywall is not the issue then I would suggest to go with Google's Cloud Natural Language (of course it is faster and accurate).

I have personally used spaCy and AllenNLP. I would say go with spaCy if you are seeking to just start with.

Hope this helps.

0
On

TL;DR: Simply pick an existing system which is seems easy to implement for you and seems to have reasonable accuracy. This can either be a cloud offering (for example, IBM Watson Conversation, Google DialogFlow) or an library or executable (for example, RASA NLU or Natural Language Toolkit). Choosing a system solely on accuracy is non-trivial and if you always want the best, then you should switch between systems often.

You question asks which system will give the most accurate results while not requiring too much computational power. In your case for recognizing a person name from a text. The natural language processing (NLP) field is rapidly changing. To show this, we can look at the current state of the art (SOTA) for named-entity recognition (NER). This Github page has a nice summary for the CONLL03 NER dataset, I will copy it here and use company names since they are easier to remember:

  1. Zalando. F1 score: 0.931. Date: 24 June 2018
  2. Google. F1 score: 0.928. Date: 31 October 2018
  3. Stanford / Google Brain. F1 score: 0.926. Date: 22 September 2018

Based on this list we observe that, at the start of 2019, a new SOTA is obtained every few months. See https://rajpurkar.github.io/SQuAD-explorer/ for an updated list of benchmarks for a complex NLP task. So, since the SOTA algorithm changes each month, "the most accurate system (library)" also has to change often. Furthermore, the accuracy on your data depends not only on the system, but also on the following:

  • Used algorithm. It could be that Google has published SOTA research but not implemented it. The only way to figure it out, for sure, is continually testing all systems.
  • Training data size. Although bigger is better, some algorithms can handle few examples (few-shot learning) better.
  • Domain. An algorithm could be better suitable for handling formal governmental text instead of less formal Wikipedia text.
  • Data language. Since most research is focused on showing SOTA on public data sets, they are often optimized for English. How they perform on other languages might differ.

Due to all these things to consider, I would advise to pick an existing system and choose based on many requirements such as pricing and ease of use.