How to generate a meaningful sentence from words only?

5.7k Views Asked by At

I want to generate a sentence from list of words. I have tried n-gram model but it only generates the text from already existing sentence i.e. we input a sentence and it outputs the next generated words based on the value of n. Which model will be helpful to generate a meaningful sentence from only the list of words and which dataset should be used to train the model?

4

There are 4 best solutions below

0
On

The dataset: Just take a dataset constisting of sentences. Tokenize each sentence and shuffle the sentences. These shuffled tokens are your input, your sentence the output. Therefore you can generate as many samples as you wish:

def create_input(sentence):
    tokens = nltk.word_tokenize(sentence)
    shuffle(tokens)
    return tokens

More difficult is the model: You could try to Fine-Tune a BERT model and I guess it will probably work.

0
On

Thanks to text generation models like GPT-3, GPT-J, and GPT-NeoX, you can generate content out of simple keywords.

For example, let's say you want to generate a product description out of a couple of keywords, you could use few-shot learning and do something like this:

Generate a product description out of keywords.

Keywords: shoes, women, $59
Sentence: Beautiful shoes for women at the price of $59.
###
Keywords: trousers, men, $69
Sentence: Modern trousers for men, for $69 only.
###
Keywords: gloves, winter, $19
Sentence: Amazingly hot gloves for cold winters, at $19.
###
Keywords: t-shirt, men, $39
Sentence:

I actually wrote an article about this that you might find useful: effectively using GPT-J with few-shot learning

0
On

What you want is called lexically constrained beam search in natural language generation literature.

pip install -q git+https://github.com/huggingface/transformers.git

then this code can generated a sentence with the forced words list.

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("t5-base")
model = AutoModelForSeq2SeqLM.from_pretrained("t5-base")

encoder_input_str = "Generate a sentence:"

force_words = ["I", "school"]

input_ids = tokenizer(encoder_input_str, return_tensors="pt").input_ids
force_words_ids = tokenizer(force_words, add_special_tokens=False).input_ids

outputs = model.generate(
    input_ids,
    force_words_ids=force_words_ids,
    num_beams=5,
    num_return_sequences=1,
    no_repeat_ngram_size=1,
    remove_invalid_values=True,
)


print("Output:\n" + 100 * '-')
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

For further information refer to this.

If you don't want to use deep learning, index a lot of sentences, search for the keywords using a retrieval system like Lucence, and retrieve a sentence that is closest to your query.

0
On

You can use GPT-J. It is a free GPT model and its performance is comparable to GPT-3. The model takes the input that you provide it with, and tries to complete it.

How I use GPT-J to generate a sentence from a set of keywords:

Input:

Make a sentence with the following words: earth, dirt, alligator
Sentence: While the alligator is a species which mainly lives in the water, the earth is not uncommon territory and they like to dig through the dirt.

Make a sentence with the following words: shape, lantern, hair
Sentence: 

Output:

Make a sentence with the following words: earth, dirt, alligator
Sentence: While the alligator is a species which mainly lives in the water, the earth is not uncommon territory and they like to dig through the dirt.

Make a sentence with the following words: shape, lantern, hair
Sentence: The hair is so thick on the lantern that it is almost like a shape.

How to tweak to a certain use-case?

Giving an example of what you want in the input (example keywords + sentence) can help GPT to understand the structure of the desired output. Explicitly explaining the GPT what the desired task is in the input (Make a sentence...) can help it to understand the task in my experience.

You can change the complexity of the output sentence by changing the example sentence to something like: An alligator likes to dig dirt out of the earth.

How to use?

Git repo: https://github.com/kingoflolz/mesh-transformer-jax

As shown in the repo, you can use the web demo of the model for testing, and you can implement it using Colab.

Web demo: https://6b.eleuther.ai/

Colab notebook: http://colab.research.google.com/github/kingoflolz/mesh-transformer-jax/blob/master/colab_demo.ipynb

I do not recommend trying to run it locally.