Create a tuple of tokens and texts for conditional frequency distribution

Question

Create a tuple of tokens and texts for conditional frequency distribution

I would like to create a table that shows the frequencies of some words in three texts, whereas texts are columns and words are rows.

In the table, I would like to see which word appears, how often in the text.

These are my lyrics and words:

texts = [text1, text2, text3]
words = ['blood', 'young', 'mercy', 'woman', 'man', 'fear', 'night', 'happiness', 'heart', 'horse']

To create a conditional frequency distribution, I wanted to create a list of tuples that should look like lot = [('text1', 'blood'), ('text1', 'young'), ... ('text2', 'blood' ), ...)

I tried to create a batch like this:

lot = [(words, texte)
    for word in words
    for text in texts]

Instead of lot = ('text1', 'blood') etc. instead of 'text1' - all text in the list.

How do I create a list of tuples intended for a conditional frequency allocation function?

+3

python tuples frequency-distribution

Fadinha June 21. 15 at 23:55

source to share

2 answers

user3636636 · Answer 1 · 2015-06-22T01:13:40+0000

I hope I understood your question correctly. I think you are assigning both the "word" and "texts" variables to your own tuple.

Try the following:

texts = [text1, text2, text3]
words = ['blood', 'young', 'mercy', 'woman', 'man', 'fear', 'night', 'happiness', 'heart', 'horse']
lot = [(word, text)
for word in words
for text in texts]

Edit: Since the change is so subtle, I have to work out a little more. In the original code, you set "words" and "texts" into your own tuple, i.e. You were assigning the entire array, not every element of the array.

maxymoo · Answer 2 · 2015-06-22T01:15:42+0000

I think this nested list comprehension might be what you are trying to do?

lot = [(word, 'text'+str(i))
    for i,text in enumerate(texts)
    for word in text.split()
    if word in words]

However, you may need to use instead Counter

:

from collections import Counter
counts = {}
for i, text in enumerate(texts):
   C = Counter(text.split())
   for word in words:
      if word in C:
         counts[word]['text'+str(i)] = C[word]
      else: 
         counts[word]['text'+str(i)] = 0

Create a tuple of tokens and texts for conditional frequency distribution

More articles: