I am trying to take a text file and take all the words longer then three letters and print them in a column. I then want to match them with the line numbers that they appear on, in a second column. e.g.
Chicken 8,7
Beef 9,4,1
....
The problem is I don't want to have duplicates. Right now I have the word kings which appears in a line twice, and I only want it to print once. I am thoroughly stumped and am in need of the assistance of a wise individual.
My Code:
storyFile=open('StoryTime.txt', 'r')
def indexMaker(inputFile):
''
# Will scan in each word at a time and either place in index as a key or
# add to value.
index = {}
lineImOn = 0
for line in inputFile:
individualWord = line[:-1].split(' ')
lineImOn+=1
placeInList=0
for word in individualWord:
index.get(individualWord[placeInList])
if( len(word) > 3): #Makes sure all words are longer then 3 letters
if(not individualWord[placeInList] in index):
index[individualWord[placeInList]] = [lineImOn]
elif(not index.get(individualWord[placeInList]) == str(lineImOn)):
type(index.get(individualWord[placeInList]))
index[individualWord[placeInList]].append(lineImOn)
placeInList+=1
return(index)
print(indexMaker(storyFile))
Also if anyone knows anything about making columns you would be a huge help and my new best friend.
I would do this using a dictionary of sets to keep track of the line numbers. Actually to simplify things a bit I'd use a
collections.defaultdict
with values that were of typeset
. As mentioned in another answer, it's probably best to parse of the words using a regular expression via there
module.Alternative not using
re
module:Either way, the
make_index()
function could be used and the results output in two columns like this:As a test case I used the following passage (notice the word "die" is in the last line twice):
And get the following results: