Create a tuple of tokens and texts for conditional frequency distribution
I would like to create a table that shows the frequencies of some words in three texts, whereas texts are columns and words are rows.
In the table, I would like to see which word appears, how often in the text.
These are my lyrics and words:
texts = [text1, text2, text3]
words = ['blood', 'young', 'mercy', 'woman', 'man', 'fear', 'night', 'happiness', 'heart', 'horse']
To create a conditional frequency distribution, I wanted to create a list of tuples that should look like lot = [('text1', 'blood'), ('text1', 'young'), ... ('text2', 'blood' ), ...)
I tried to create a batch like this:
lot = [(words, texte)
for word in words
for text in texts]
Instead of lot = ('text1', 'blood') etc. instead of 'text1' - all text in the list.
How do I create a list of tuples intended for a conditional frequency allocation function?
source to share
I hope I understood your question correctly. I think you are assigning both the "word" and "texts" variables to your own tuple.
Try the following:
texts = [text1, text2, text3]
words = ['blood', 'young', 'mercy', 'woman', 'man', 'fear', 'night', 'happiness', 'heart', 'horse']
lot = [(word, text)
for word in words
for text in texts]
Edit: Since the change is so subtle, I have to work out a little more. In the original code, you set "words" and "texts" into your own tuple, i.e. You were assigning the entire array, not every element of the array.
source to share
I think this nested list comprehension might be what you are trying to do?
lot = [(word, 'text'+str(i))
for i,text in enumerate(texts)
for word in text.split()
if word in words]
However, you may need to use instead Counter
:
from collections import Counter
counts = {}
for i, text in enumerate(texts):
C = Counter(text.split())
for word in words:
if word in C:
counts[word]['text'+str(i)] = C[word]
else:
counts[word]['text'+str(i)] = 0
source to share