Natural Language Generation - how to go beyond templates

1.3k Views Asked by At

We've build a system that analyzes some data and outputs some results in plain English (i.e. no charts etc.). The current implementation relies on lots of templates and some randomization in order to give as much diversity to the text as possible.

We'd like to switch to something more advanced with the hope that the produced text is less repetitive and sounds less robotic. I've searched a lot on google but I cannot find something concrete to start from. Any ideas?

EDIT: The data fed to the NLG mechanism are in JSON format. Here is an example about web analytics data. The json file may contain for example a metric (e.g. visits), it's value in the last X days, whether the last value is expected or not and which dimensions (e.g. countries or marketing channels) affected its change.

The current implementation could give something like this:

Overall visits in the UK mainly from ABC email campaign reached 10K (+20% DoD) and were above the expected value by 10%. Users were mainly landing on XXX page while the increase was consistent across devices.

We're looking to finding a way to depend less on templates, sound even more natural and increase the vocabulary.

2

There are 2 best solutions below

1
On

Have you tried Neural Networks especially LSTM and GRU architectures? These models are the most recent developments in predicting sequences of words. Generating natural language means to generate a sequence of words such that it makes sense with respect to the input and earlier words in the sequence. This is equivalent to predicting time series. LSTM is designed for predicting time series. Hence, it is commonly used to predict a sequence of words, given an input sequence, an input word, or any other input that can be embedded in a vector.

Deep learning libraries such as Tensorflow, Keras, and Torch all have sequence to sequence implementations that can be used for generating natural language by predicting a sequence of words given an input.

Note that usually these models need a huge amount of training data.

You need to meet two criteria in order to benefit from such models:

  1. You should be able to represent your input as a vector.
  2. You need a relatively large amount of input/target pairs.
0
On

What you are looking for is a hot research area and a pretty tough task. Currently there is no way to generate 100% meaningful diverse and natural sentences. one approach to generate sentences is using n-grams. using these method you can generate sentences that look more natural and diverse that may look good but probably meaningless and grammatically incorrect. A more up to date approach is using Deep learning. anyway if you want to generate meaningful sentences, maybe your best way is using your current template based method. You can find an introduction to basics of n-gram based NLG here: Generating Random Text with Bigrams

this tool sounds to implement some of the most famous techniques for natural language generation: simplenlg