Why is my pandas dataframe not updating its values ​​as they change?

I'm trying to make changes to every line in my "tweet_text" Series object, but for some reason, when I print the series object after tweeting changes in the for loop, I get the same lines as the for loop before. How can I fix this?

import pandas as pd
import re
import string

df = pd.read_csv('sample-tweets.csv',
                 names=['Tweet_Date', 'User_ID', 'Tweet_Text', 'Favorites', 'Retweets', 'Tweet_ID'])

sum_df = df[['User_ID', 'Tweet_ID', 'Tweet_Text']].copy()
# print sum_df

tweet_text = df.ix[:, 2]
print type(tweet_text)

# efficiency could be im proved by using translate method
# regex = re.compile('[%s]' % re.escape(string.punctuation))

for tweet in tweet_text:
    tweet = re.sub('https://t.co/[a-zA-Z0-9]*', "", tweet)
    tweet = re.sub('@[a-zA-Z0-9]*', '', tweet)
    tweet = re.sub('#[a-zA-Z0-9]*', '', tweet)
    tweet = re.sub('$[a-zA-Z0-9]*', '', tweet)
    tweet = ''.join(i for i in tweet if not i.isdigit())
    tweet = tweet.replace('"', '')
    tweet = re.sub(r'[\(\[].*?[\)\]]', '', tweet)  # takes out everything between parentheses also, fix this

    # gets rid of all punctuation and emoji's
    tweet = "".join(l for l in tweet if l not in string.punctuation)
    tweet = re.sub(r'[^\x00-\x7F]+',' ', tweet)

    # gets ride of all extra spacing
    tweet = tweet.lower()
    tweet = tweet.strip()
    tweet = " ".join(tweet.split())

    count = count + 1
    # print tweet

print tweet_text



source to share

2 answers

It does this because it tweet_text

is a copy of the column df.ix[:, 2]

for starters. Secondly, this is not pandas' way of iterating over Series

- you should use apply()


To update your code, everything that goes into the loop is changed to a function:

def parse_tweet(tweet):
    ## everything from loop goes here
    return tweet


Then instead of:

tweet_text = df.ix[:, 2]



df.iloc[:, 2] = df.iloc[:, 2].apply(parse_tweet)


BTW, don't use the index ix

as it is depreciating and will be removed in future versions of pandas.



Python strings are immutable. You just change the value assigned to the variable tweet

, but never update the actual file.

You just need to re-insert the updated value back into your frame. An example of a simple fix:

for i, tweet in enumerate(tweet_text):
    tweet = re.sub('https://t.co/[a-zA-Z0-9]*', "", tweet)
    tweet = re.sub('@[a-zA-Z0-9]*', '', tweet)

    # ...

    # update dataframe
    df.ix[i, 2] = tweet




All Articles