Manipulating strings in a python list

Question

Manipulating strings in a python list

I have a list of tweets that are grouped into chunks of tweets in a list, for example:

[[tweet1, tweet2, tweet3],[tweet4,tweet5,tweet6],[tweet7, tweet8, tweet9]]

I want to count the number of occurrences of each word in each subgroup. To do this, I need to split each tweet into separate words. I want to use something similar to str.split (''), but I am getting an error:

AttributeError: 'list' object has no attribute 'split'

Is there a way to split each tweet into separate words? The result should look something like this:

[['word1', 'word2', 'word3', 'word2', 'word2'],['word1', 'word1', 'word3', 'word4', 'word5'],['word1', 'word3', 'word3', 'word5', 'word6']]

+3

python

user3745115 Apr 22. At 1:22 am

source to share

5 answers

tobyodavies · Answer 1 · 2015-04-22T01:28:15+0000

If you have a list of strings

tweets = ['a tweet', 'another tweet']

Then you can split each item using a list comprehension

split_tweets = [tweet.split(' ')
                for tweet in tweets]

Since this is a list of tweet lists:

tweet_groups = [['tweet 1', 'tweet 1b'], ['tweet 2', 'tweet 2b']]
tweet_group_words = [[word
                      for tweet in group
                      for word in tweet.split(' ')]
                     for group in tweet_groups]

Which will give a list of word lists.

If you want to count different words,

words = [set(word 
             for tweet in group
             for word in tweet.split(' '))
         for group in tweet_groups]

sshashank124 · Answer 2 · 2015-04-22T01:27:54+0000

You want something like this:

l1 = [['a b', 'c d', 'e f'], ['a b', 'c d', 'e f'], ['a b', 'c d', 'e f']]

l2 = []
for i,j in enumerate(l1):
    l2.append([])
    for k in j:
        l2[i].extend(k.split())

print(l2)

DEMO

Amadan · Answer 3 · 2015-04-22T01:29:11+0000

groups = [["foo bar", "bar baz"], ["foo foo"]]
[sum((tweet.split(' ') for tweet in group), []) for group in groups]
# => [['foo', 'bar', 'bar', 'baz'], ['foo', 'foo']]

EDIT: Seems like an explanation is needed.

For each group [... for group in groups]
- For every tweet broken down into words (tweet.split(' ') for tweet in group)
- Combine split tweets sum(..., [])

Padraic cunningham · Answer 4 · 2015-04-22T01:44:59+0000

If you want to count occurrences, use a Counter dict, linking all words with itertools.chain after splitting.

from collections import Counter
from itertools import chain

tweets  = [['foo bar', 'foo foobar'], ['bar foo', 'bar']]
print([Counter(chain.from_iterable(map(str.split,sub)))  for sub in tweets] )
[Counter({'foo': 2, 'foobar': 1, 'bar': 1}), Counter({'bar': 2, 'foo': 1})]

Anatzum · Answer 5 · 2015-04-22T02:24:50+0000

You can create a function that you pass to your list to collect and return a dictionary of words and how many times they appear in your tweets.

def countWords(listitem):
    a = []
    for x in listitem:
        for y in x:
            for z in y.split(' '):
                a.append(z)
    b = {}
    for word in a:
        if word not in b:
            b[word] = 1
        else:
            b[word] += 1
    return b

this way you keep both lists and can assign the return value to a new variable to test.

dictvar = countWords(listoftweets)

creating a definition will allow you to place it inside your own file, which you can always use in the future.

Manipulating strings in a python list

More articles: