Python NLTK Inaugural Text Corpora hands-on solution needed

Question

Python NLTK Inaugural Text Corpora hands-on solution needed

4.5k Views Asked by Tanuj Chadha At 28 June 2025 at 12:51

I am doing a course in NLTK Python which has a hands-on problem(on Katacoda) on "Text Corpora" and it is not accepting my solution mentioned below. Have been stuck on this problem since long. Need to complete this hands-on to proceed foreword in course.

Problem definition:

Import inaugural corpus For each of the inaugural address text available in the corpus, perform the following. Convert all words into lower case. Then determine the number of words starting with america or citizen.

Hint : Compute conditional frequency distribution, where condition is the year in which the inaugural address was delivered and event is either america or citizen. Store the conditional frequency distribution in variable ac_cfd.

Print the frequency of words ['america', 'citizen'] in year [1841, 1993].

Hint: Make use of tabulate method associated with a conditional frequency distribution.

For this I have written below solution:

ac_cfd = nltk.ConditionalFreqDist((target, fileid[:4]) 
for fileid in inaugural.fileids() 
for w in inaugural.words(fileid) 
for target in ['america', 'citizen'] 
if w.lower().startswith(target))
ac_cfd.tabulate(conditions=['america', 'citizen'], samples=['1841', '1993'])

which gives output:

          1841 1993 
american     7   14  
citizen     38    2

I was nto able to find same problem on different forums, though I did found a similar problem which wanted to plot the conditional frequency distribution, their solution was same as mine with one different, instead of tabulate line they had plot. (https://www.nltk.org/book/ch02.html) But Katacoda isn't accepting this solution and I am not able to proceed foreward in the course as completing hands-on is mandatory. Please Help

Original Q&A

There are 2 best solutions below

**balaji Vijayakumar** · Answer 1

ac_cfd = nltk.ConditionalFreqDist(
    [(fileid[:4], target) for fileid in inaugural.fileids() for w in inaugural.words(fileid) for target in
     ['america', 'citizen'] if w.lower().startswith(target)])

ac_cfd.tabulate(conditions=['1841', '1993'], samples=['america', 'citizen'])

Question was to Print the frequency of words ['america', 'citizen'] in year [1841, 1993] but you where doing the reverse hence Its not getting accepted.

**Anil Chamoli** · Answer 2

Use below code. It works for me on Katacoda. In question it is asking for the words starting with america and citizens hence I sliced the words to 7 characters.

import nltk

from nltk.corpus import inaugural

ac_cfd = nltk.ConditionalFreqDist([(fileid[:4],word.lower()[:7]) 
                                   for fileid in inaugural.fileids() 
                                   for word in inaugural.words(fileid)
                                  ])

print(ac_cfd.tabulate(conditions =['1841', '1993'],  samples=['america', 'citizen'] ))



   america citizen 
1841       7      38    
1993      33       2

Python NLTK Inaugural Text Corpora hands-on solution needed

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in NLTK

Related Questions in NLTK-BOOK

Related Questions in KATACODA

Trending Questions

Popular # Hahtags

Popular Questions