NLG - Create text descriptions with simplenlg

3.1k Views Asked by At

I'm trying to generate product descriptions with the help of NLG. For example if I specify the properties of the product(say a mobile phone) such as its OS, RAM, processor, display, battery etc., It should output me a readable description of the mobile phone. I see there are some paid services (Quill, Wordsmith etc.) which does the same. Then I came across the open source Java API for NLG - simplenlg. I see how to create sentences by specifying the the sentence phrases and the features(such as tense, interrogation etc), but don't see option to create a description from texts.

Do anyone know how to create text description from words with simplenlg?

Is there any other tools/frameworks/APIs available to accomplish this task (not limited to Java)?

1

There are 1 best solutions below

0
On

SimpleNLG is primarily a Surface Realizer. It requires a well formatted input but can then perform tasks such as changing the tense of the sentence. An explanation of the types of task which a realizer can perform can be found at the above link.

Generating sentence like those you describe would require additional components to handle the document planning and microplanning. The exact boundaries between these components is blurred but broadly speaking will have you define what you want to say in a document plan, then have the microplanner perform task such as referring expressing generation (choosing whether to say 'it' rather than 'the mobile phone') and aggregation, which is the merging of sentences. SimpleNLG has some support for aggregation.

It is also worth noting that this 3 stage process is not the only way to perform NLG, it is just a common one.

There is no magic solution I am aware of to take some information from a random domain and generate readable and meaningful text. In your mobile phone example it would be trivial to chain descriptions together and form something like:

The iPhone 7 has iOS11, 2GB RAM, a 1960 mA·h Li-ion battery and a $649 retail cost for the 32GB model.

But this would just be simple string concatenation or interpolation from your data. It does not account for nuance like the question of whether it would be better to say:

The iPhone 7 runs iOS11, has 2GB of RAM and is powered by a 1960 mA·h Li-ion battery. It costs $649 retail for the 32GB model.

In this second example I have adjusted verbs (and therefore noun phrases), used the referring expression of 'it' and split our long sentence in two (with some further changes because of the split). Making these changes requires knowledge (and therefore computational rules) of the words and their usage within the domain. It becomes non-trivial very quickly.

If your requirements are as simple as 5 or 6 pieces of information about a phone, you could probably do it pretty well without NLG software, just create some kind of template and make sure all of your data makes sense when inserted. As soon as you go beyond mobile phones however, describing say cars, you would need to do all this work again for the new domain.

It would be worthwhile to look at Ehud Reiter's blog (the initial author of SimpleNLG). There are also papers such as Albert Gatt (Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation) although the latter is a bit dense if you are only dabbling in a little programming, it does however give an account of what NLG is, what it can do and what its current limitations are.