create a tuple of tokens and texts for a conditional frequency distribution

298 Views Asked by At

I'd like to create a table that shows the frequencies of certain words in 3 texts, whereas the texts are the columns and the words are the lines.

In the table I'd like to see which word appears how often in which text.

These are my texts and words:

texts = [text1, text2, text3]
words = ['blood', 'young', 'mercy', 'woman', 'man', 'fear', 'night', 'happiness', 'heart', 'horse']

In order to create a conditional frequency distribution I wanted to create a list of tuples that should look like lot = [('text1', 'blood'), ('text1', 'young'), ... ('text2', 'blood'), ...)

I tried to create lot like this:

lot = [(words, texte)
    for word in words
    for text in texts]

Instead of lot = ('text1', 'blood') etc. instead of 'text1' is the whole text in the list.

How can I create the list of tuples as intended for the conditional frequency distribution function?

2

There are 2 best solutions below

0
On

Hopefully I have understood your question correctly. I think you are assigning both variable 'word' and 'texts' to their own tuple.

Try the following:

texts = [text1, text2, text3]
words = ['blood', 'young', 'mercy', 'woman', 'man', 'fear', 'night', 'happiness', 'heart', 'horse']
lot = [(word, text)
for word in words
for text in texts]

Edit: Because the change is so subtle, I should elaborate a bit more. In your original code you were setting both 'words' and 'texts' to their own tuple, ie you were assigning the whole array rather than each element of the array.

0
On

I think this nested list comprehension might be what you're trying to do?

lot = [(word, 'text'+str(i))
    for i,text in enumerate(texts)
    for word in text.split()
    if word in words]

However you might want to consider using a Counter instead:

from collections import Counter
counts = {}
for i, text in enumerate(texts):
   C = Counter(text.split())
   for word in words:
      if word in C:
         counts[word]['text'+str(i)] = C[word]
      else: 
         counts[word]['text'+str(i)] = 0