Create a tuple of tokens and texts for conditional frequency distribution
I would like to create a table that shows the frequencies of some words in three texts, whereas texts are columns and words are rows.
In the table, I would like to see which word appears, how often in the text.
These are my lyrics and words:
texts = [text1, text2, text3]
words = ['blood', 'young', 'mercy', 'woman', 'man', 'fear', 'night', 'happiness', 'heart', 'horse']
To create a conditional frequency distribution, I wanted to create a list of tuples that should look like lot = [('text1', 'blood'), ('text1', 'young'), ... ('text2', 'blood' ), ...)
I tried to create a batch like this:
lot = [(words, texte)
for word in words
for text in texts]
Instead of lot = ('text1', 'blood') etc. instead of 'text1' - all text in the list.
How do I create a list of tuples intended for a conditional frequency allocation function?
I hope I understood your question correctly. I think you are assigning both the "word" and "texts" variables to your own tuple.
Try the following:
texts = [text1, text2, text3]
words = ['blood', 'young', 'mercy', 'woman', 'man', 'fear', 'night', 'happiness', 'heart', 'horse']
lot = [(word, text)
for word in words
for text in texts]
Edit: Since the change is so subtle, I have to work out a little more. In the original code, you set "words" and "texts" into your own tuple, i.e. You were assigning the entire array, not every element of the array.
I think this nested list comprehension might be what you are trying to do?
lot = [(word, 'text'+str(i))
for i,text in enumerate(texts)
for word in text.split()
if word in words]
However, you may need to use instead Counter
:
from collections import Counter
counts = {}
for i, text in enumerate(texts):
C = Counter(text.split())
for word in words:
if word in C:
counts[word]['text'+str(i)] = C[word]
else:
counts[word]['text'+str(i)] = 0