How to combine two sentences using simplenlg

1.3k Views Asked by At

Given a set of sentence like "John has a cat" and "John has a dog" would to create a sentence like "John has a cat and dog".

Can I use simplenlg to create the same.

1

There are 1 best solutions below

0
On

The task you are asking about is called aggregation in Natural Language Generation (NLG). Whilst SimpleNLG does support aggregation with its realisation engine, it will not directly aggregate two strings such as those in your example.

It is possible however to use a syntactic parser and SimpleNLG to perform this task. I will first explain how to generate your target sentence using SimpleNLG grammar:

import simplenlg.framework.*;
import simplenlg.lexicon.*;
import simplenlg.realiser.english.*;
import simplenlg.phrasespec.*;
import simplenlg.features.*;

public class TestMain {

  public static void main(String[] args) throws Exception {
    Lexicon lexicon = Lexicon.getDefaultLexicon();
    NLGFactory nlgFactory = new NLGFactory(lexicon);
    Realiser realiser = new Realiser(lexicon);

    // Create the SPhraseSpec object (sentence phrase).
    SPhraseSpec p = nlgFactory.createClause();

    // Create a noun phrase and set it as the subject of your sentence
    NPPhraseSpec john = nlgFactory.createNounPhrase("John");
    p.setSubject(john);

    // Create a verb phrase and set it as the verb of your sentence
    VPPhraseSpec have = nlgFactory.createVerbPhrase("have");
    // Note that the verb is "have" not "has".  Have is the base lemma.
    // The morphology of this will be handled based on the tense you set (see below)
    p.setVerb(have);

    // Create a determiner 'a'
    NPPhraseSpec a = nlgFactory.createNounPhrase("a");

    // Create two more noun phrases

    // One for dog
    NPPhraseSpec cat = nlgFactory.createNounPhrase("cat");
    // set the determiner
    cat.setDeterminer(a);;

    // And one for cat.
    NPPhraseSpec dog = nlgFactory.createNounPhrase("dog");
    // set the determiner
    dog.setDeterminer(a);

    // Create a coordinated phrase
    // This tells SimpleNLG that these objects are a collection which should be aggregated
    CoordinatedPhraseElement coord = nlgFactory.createCoordinatedPhrase(cat, dog);

    // Set the coordinated phrase as the object of your sentence
    p.setObject(coord);

    // Print it - 
    String output = realiser.realiseSentence(p);
    System.out.println(output);
    // => John has a cat and a dog.

    // Now lets see what SimpleNLG can do!

    // Change the tense to past (present was the default)
    p.setTense(Tense.PAST);
    output = realiser.realiseSentence(p);
    System.out.println(output);
    // => John had a cat and a dog.

    // Change the tense to future
    p.setTense(Tense.FUTURE);
    output = realiser.realiseSentence(p);
    System.out.println(output);
    // => John will will have a cat and a dog.
  }
}

That is how you work with language in the SimpleNLG realiser. It does not however answer your question of aggregating two strings directly. There may be other ways but my first thought is to use a syntactic parses such as StanfordNLP or spaCy.

I use spaCy in my own work (which is a python library). I will show a brief example of what I mean here.

import spacy

nlp = spacy.load('en_core_web_sm')
doc = nlp(u'John has a cat')

for token in doc:
    print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_,
          token.shape_, token.is_alpha, token.is_stop)

This outputs:

John john PROPN NNP nsubj Xxxx True False
has have VERB VBZ ROOT xxx True True
a a DET DT det x True True
cat cat NOUN NN dobj xxx True False

You can see from the output that each token in the sentence has been marked as a noun, verb, determiner etc. You could use this information to format the input for SimpleNLG and then aggregate your sentences. I would suggest the XMLRealiser available in SimpleNLG would be better than just coding the grammar in Java. It takes XML as input.

NLP/NLG work is not trivial. Language is very complex. The above is just one way of approaching such a task. Tools might exist which just aggregate based on strings, but SimpleNLG is just a surface realiser so you would have to present it with input data in a suitable format as shown above.