trigram model getting IndexError: list index out of range when choosing random word

78 Views Asked by Jesper Ezra At 29 July 2025 at 16:57

I'm new to python and need help with NLTK language modeling.

I'm trying to generate the setence starting with "he said" using trigram model but get the following error:

Traceback (most recent call last):
  File "C:\Users\PycharmProjects\homework3 3\main.py", line 77, in <module>
    suffix = pick_word(d[prefix])
  File "C:\Users\PycharmProjects\homework3 3\main.py", line 71, in pick_word
    return random.choice(sents)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2288.0_x64__qbz5n2kfra8p0\lib\random.py", line 378, in choice
    return seq[self._randbelow(len(seq))]
IndexError: list index out of range

I don't understand why it's complaining the list index is out of range. What I think it should be doing is taking the reuters sentence and should pick a word from it randomly and pass it as suffix

Heres the whole code, please only focus on the trigram portion as he rest is incomplete

# imports
import string
import random

import nltk

nltk.download('punkt')
nltk.download('stopwords')
nltk.download('reuters')
from nltk.corpus import reuters, stopwords
from collections import defaultdict
from nltk import FreqDist, ngrams

# input the reuters sentences
sents = reuters.sents()

# write the removal characters such as : Stopwords and punctuation
stop_words = set(stopwords.words('english'))
string.punctuation = string.punctuation + '"' + '"' + '-' + '''+''' + '—'
removal_list = list(stop_words) + list(string.punctuation) + ['lt', 'rt']

# generate unigrams bigrams trigrams
unigram = []
trigram = []
tokenized_text = []

for sentence in sents:
    sentence = list(map(lambda x: x.lower(), sentence))
for word in sentence:
    if word == '.':
        sentence.remove(word)
    else:
        unigram.append(word)

tokenized_text.append(sentence)
trigram.extend(list(ngrams(sentence, 3, pad_left=True, pad_right=True)))

# remove the n-grams with removable words
def remove_stopwords(x):
    y = []
    for pair in x:
        count = 0
        for word in pair:
            if word in removal_list:
                count = count or 0
            else:
                count = count or 1
        if (count == 1):
            y.append(pair)
    return (y)

trigram = remove_stopwords(trigram)

# generate frequency of n-grams
freq_tri = FreqDist(trigram)

d = defaultdict(list)

#Trigrams
for a, b, c in freq_tri:
    if (a != None and b != None and c != None):
        d[a, b].extend([c] * freq_tri[a,b,c])
#        print(" d[a, b].extend([c] * freq_tri[a,b,c]) ",  d[a, b].extend([c] * freq_tri[a,b,c]))

#Next word prediction
s = ''

def pick_word(sents):
    "Chooses a random element."
    return random.choice(sents)

prefix = "he", "said"
print(" ".join(prefix))
s = " ".join(prefix)
for i in range(19):
    suffix = pick_word(d[prefix])

What am I doing wrong? Am I assuming wrong that I'm passing the reuters sentence to choose a word randomly and doing something wrong?

I thought maybe I was choosing the wrong list to pass in the pick_word function and tried to use tokenized_text. I receive the same error so I think my asumption or understand of this is wrong. I'm not sure which part of it is wrong.

Original Q&A

trigram model getting IndexError: list index out of range when choosing random word

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in NLTK

Related Questions in N-GRAM

Related Questions in TRIGRAM

Trending Questions

Popular # Hahtags

Popular Questions