How do I remove synonyms?

I am creating code in Python 3.4.3. I have a linguistic program. This part of my code should remove the next word if it is synonymous with the previous word. First, we need to create a list of synonyms for each word. We then convert all of our lists to sets. But ultimately we need to compare our lists to see if they have the same synonyms. I don't know how to compare them. We should only leave one word if there is a synonym in what follows.

from nltk.corpus import wordnet
text = ['','','']
text4 = []

def f4(text):
    global text4

    synonyms = []
    for sentence in text:
        d = ' '
        sentence = sentence.split(d)
        for word in sentence:
            syn = []
            for syn in wordnet.synsets(word):
                for lemma in syn.lemmas():
                    syn.append(lemma.name())
            synonyms.append(syn)

    synonyms2 = []
    for x in synonyms:
        x = set(x)
        synonyms2.append(x)

      

+3


source to share


2 answers


My code is to remove the next word if it is synonymous with the previous word.

I would suggest a different algorithm. Here's an example:

text = 'run race stroll rush nice lovely mean kind' # example text
synonyms = [] # contains a list of synonym lists
synonyms.append( ['run', 'race', 'rush'] ) # run synonyms
synonyms.append( ['nice', 'lovely', 'kind'] ) # nice synonyms

def in_synonyms(list_of_synonym_lists, word):
    """ Returns index of synonym list the word is in; -1 if isn't found. """
    for index, synonym_list in enumerate(list_of_synonym_lists):
        if word in synonym_list:
            return index
    return -1

# The algorithm
split_text = text.split()
index = 1
while index < len(split_text):
    if in_synonyms(synonyms, split_text[index]) != -1: # if word is in any synonyms list
        if in_synonyms(synonyms, split_text[index]) == in_synonyms(synonyms, split_text[index-1]):
            # if word before is in the same synonyms list as current we delete the current
            # one and start over again
            del(split_text[index])
            index = 1 # restart the algorithm
        else:
            index += 1 # continue on forward
text = ' '.join(split_text)

      



This code:

  • Creates a list of lists of synonyms
  • Iterating through words of text
    • If the previous word is in the same list of synonyms as the current one, we delete the current one and restart the algorithm
    • Otherwise we will continue moving forward

I haven't tested it yet, but I hope you get the idea.

+1


source


If you want to filter out words that are repetitions, tautologies, synonyms of previous words:

filtered = []
previous_word = None
for word in sentence.split(' '):
    if previous_word and synonymous(word, previous_word):
        continue
    else:
        filtered.append(word)
        previous_word = word

' '.join(filtered)

      



You can do this in a list comprehension:

words = sentence.split(' ')
new_sentence = ' '.join(word for word, previous in zip(words, [None] + words)
                        if not synonymous(word, previous))

      

+1


source







All Articles