spaCy Alternatives in Java

10.5k Views Asked by At

I currently use spaCy to traverse the dependency tree, and generate entities.

nlp = get_spacy_model(detect_lang(unicode_text))
doc = nlp(unicode_text)

entities = set()
for sentence in doc.sents:

  # traverse tree picking up entities
  for token in sentence.subtree:
    ## pick entitites using some pre-defined rules

entities.discard('')
return entities

Are there any good Java alternatives for spaCy?

I am looking for libs which generate the Dependency Tree as is done by spaCy.

EDIT:

I looked into Stanford Parser. However, it generated the following parse tree:

                     ROOT
                      |
                      NP
       _______________|_________
      |                         NP
      |                _________|___
      |               |             PP
      |               |     ________|___
      NP              NP   |            NP
  ____|__________     |    |     _______|____
 DT   JJ    JJ   NN  NNS   IN   DT      JJ   NN
 |    |     |    |    |    |    |       |    |
the quick brown fox jumps over the     lazy dog

However, I am looking for a tree structure like spaCy does:

                             jumps_VBZ
   __________________________|___________________
  |       |        |         |      |         over_IN
  |       |        |         |      |            |
  |       |        |         |      |          dog_NN
  |       |        |         |      |     _______|_______
The_DT quick_JJ brown_JJ   fox_NN  ._. the_DT         lazy_JJ
4

There are 4 best solutions below

3
On

You're looking for the Stanford Dependency Parser. Like most of the Stanford tools, this is also bundled with Stanford CoreNLP under the depparse annotator. Other parsers include the Malt parser (a feature-based shift reduce parser) and Ryan McDonald's the MST parser (an accurate but slower maximum spanning tree parser).

0
On

I recently released spaCy4j which mimics Token container objects from spaCy and integrates with spaCy server or CoreNLP.

Once you have a running docker of spacy-server (very easy to set up) it's as easy as:

// Create a new spacy-server adapter with host and port matching a running instance of spacy-server.
SpaCyAdapter adapter = SpaCyServerAdapter.create("localhost", 8080);

// Create a new SpaCy object. It is thread safe and should be reused across our app
SpaCy spacy = SpaCy.create(adapter);

// Parse a doc
Doc doc = spacy.nlp("My head feels like a frisbee, twice its normal size.");

// Inspect tokens
for (Token token : doc.tokens()) {
    System.out.printf("Token: %s, Tag: %s, Pos: %s, Dependency: %s%n", 
            token.text(), token.tag(), token.pos(), token.dependency());
}

Feel free to contact via github for any questions etc.

1
On

spacy can be run through java program.

The env should be created first from command prompt by executing the following commands

python3 -m venv env
source ./env/bin/activate 
pip install -U spacy
python -m spacy download en
python -m spacy download de

create a bash file spacyt.sh with following commands,parallel to env folder

#!/bin/bash 
python3 -m venv env
source ./env/bin/activate 
python test1.py

place the spacy code as python script, test1.py

import spacy
print('This is a test script of spacy')
nlp=spacy.load("en_core_web_sm")
doc=nlp(u"This is a sentence")
print([(w.text, w.pos_) for w in doc])

// instead of print we can write to a file for further processing

In java program run the bash file

String cmd="./spacyt.sh";

        try {
            Process p = Runtime.getRuntime().exec(cmd);
            p.waitFor();
            System.out.println("cmdT executed!");
        } catch (Exception e) {
            e.printStackTrace();
        }
0
On

Another solution to integrate with Java and other languages is by using Spacy REST API. For example https://github.com/jgontrum/spacy-api-docker provide a Dockerization of Spacy REST API.