Code for replacing emoticons with "SAD" or "HAPPY" does not work as expected

Question

Code for replacing emoticons with "SAD" or "HAPPY" does not work as expected

So I wanted to replace all happy emoji with "HAPPY" and vice versa "SAD" for sad emoji for a text file. But the code doesn't work as expected. Although it detects emoticons (for now :-)), but in the example below, it does not replace the emoticon with text, it just adds text, and this is also adding it two times for reasons I don't seem to understand.

dict_sad={":-(":"SAD", ":(":"SAD", ":-|":"SAD",  ";-(":"SAD", ";-<":"SAD", "|-{":"SAD"}
dict_happy={":-)":"HAPPY",":)":"HAPPY", ":o)":"HAPPY",":-}":"HAPPY",";-}":"HAPPY",":->":"HAPPY",";-)":"HAPPY"}

#THE INPUT TEXT#
a="guys beautifully done :-)" 

for i in a.split():
    for j in dict_happy.keys():
        if set(j).issubset(set(i)):
            print "HAPPY"
            continue
    for k in dict_sad.keys():
        if set(k).issubset(set(i)):
            print "SAD"
            continue
    if str(i)==i.decode('utf-8','replace'):
       print i

INPUT TEXT

a="guys beautifully done :-)"

OUTPUT ("HAPPY" goes twice, also the smiley doesn't go away)

guys
-
beautifully
done
HAPPY
HAPPY
:-)

EXPECTED OUTPUT

guys
beautifully
done
HAPPY

+3

python text-processing nltk

rzach 17 nov. '14 at 9:45

source to share

2 answers

I used lists instead of dictionaries. Makes the code a little simpler:

list_sad = [":(", ":-("]
list_happy = [":)", ":-)"]

a = "guys beautifully done :-)" 

for i in a.split():
    if i in list_sad:
        print ("SAD")
    elif i in list_happy:
        print ("HAPPY")
    else:
        print (i)

+1

Rik Verbeek 17 nov. 14 at 9:55 am

source to share

Martijn pieters · Accepted Answer · 2014-11-17T09:49:18+0000

You turn every word and every emoji into a set; this means that you are looking for a match of individual characters. You probably wanted to use the most accurate matches:

for i in a.split():
    for j in dict_happy:
        if j == i:
            print "HAPPY"
            continue
    for k in dict_sad:
        if k == i:
            print "SAD"
            continue

You can iterate over dictionaries directly, no need to call .keys()

. You are not actually using dictionary values; you could just do:

for word in a.split():
    if word in dict_happy:
        print "HAPPY"
    if word in dict_sad:
        print "SAD"

and then maybe use sets instead of dictionaries. Then it can be boiled down to:

words = set(a.split())
if dict_happy.viewkeys() & words:
    print "HAPPY"
if dict_sad.viewkeys() & words:
    print "SAD"

using the vocabulary on the keys as typing. However, it would be better to use sets then:

sad_emoticons = {":-(", ":(", ":-|", ";-(", ";-<", "|-{"}
happy_emoticons = {":-)", ":)", ":o)", ":-}", ";-}", ":->", ";-)"}

words = set(a.split())
if sad_emoticons & words:
    print "HAPPY"
if happy_emoticons & words:
    print "SAD"

If you want to remove the emoji from the text, you will have to filter the words:

for word in a.split():
    if word in dict_happy:
        print "HAPPY"
    elif word in dict_sad:
        print "SAD"
    else:
        print word

or better yet, combine the two dictionaries and use dict.get()

:

emoticons = {
    ":-(": "SAD", ":(": "SAD", ":-|": "SAD", 
    ";-(": "SAD", ";-<": "SAD", "|-{": "SAD",
    ":-)": "HAPPY",":)": "HAPPY", ":o)": "HAPPY",
    ":-}": "HAPPY", ";-}": "HAPPY", ":->": "HAPPY",
    ";-)": "HAPPY"
}

for word in a.split():
    print emoticons.get(word, word)

Here I am passing the current word as the search key and default value; if the current word is not an emoji, the word itself is printed, otherwise the word SAD

or is printed instead HAPPY

.

Code for replacing emoticons with "SAD" or "HAPPY" does not work as expected

More articles: