i currently have a file that contains a list that is looks like
example = ['Mary had a little lamb' ,
'Jack went up the hill' ,
'Jill followed suit' ,
'i woke up suddenly' ,
'it was a really bad dream...']
I would like to find the index of the sentence with the word “woke” by example. In this example the answer should be f(“woke”)=3. F is a function.
I tried to tokenize each sentence to first find the index of the word like that:
>>> from nltk.tokenize import word_tokenize
>>> example = ['Mary had a little lamb' ,
... 'Jack went up the hill' ,
... 'Jill followed suit' ,
... 'i woke up suddenly' ,
... 'it was a really bad dream...']
>>> tokenized_sents = [word_tokenize(i) for i in example]
>>> for i in tokenized_sents:
... print i
...
['Mary', 'had', 'a', 'little', 'lamb']
['Jack', 'went', 'up', 'the', 'hill']
['Jill', 'followed', 'suit']
['i', 'woke', 'up', 'suddenly']
['it', 'was', 'a', 'really', 'bad', 'dream', '...']
But I don’t know how to finally get the index of the word and how to link it to the sentence’s index. Does someone know how to do that?
You can iterate over each string in the list, split on white space, then see if your search word is in that list of words. If you do this in a list comprehension, you can return a list of indices to the strings that satisfied this requirement.
If you prefer using the
nltk
library