How do I remove synonyms?
I am creating code in Python 3.4.3. I have a linguistic program. This part of my code should remove the next word if it is synonymous with the previous word. First, we need to create a list of synonyms for each word. We then convert all of our lists to sets. But ultimately we need to compare our lists to see if they have the same synonyms. I don't know how to compare them. We should only leave one word if there is a synonym in what follows.
from nltk.corpus import wordnet
text = ['','','']
text4 = []
def f4(text):
global text4
synonyms = []
for sentence in text:
d = ' '
sentence = sentence.split(d)
for word in sentence:
syn = []
for syn in wordnet.synsets(word):
for lemma in syn.lemmas():
syn.append(lemma.name())
synonyms.append(syn)
synonyms2 = []
for x in synonyms:
x = set(x)
synonyms2.append(x)
source to share
My code is to remove the next word if it is synonymous with the previous word.
I would suggest a different algorithm. Here's an example:
text = 'run race stroll rush nice lovely mean kind' # example text
synonyms = [] # contains a list of synonym lists
synonyms.append( ['run', 'race', 'rush'] ) # run synonyms
synonyms.append( ['nice', 'lovely', 'kind'] ) # nice synonyms
def in_synonyms(list_of_synonym_lists, word):
""" Returns index of synonym list the word is in; -1 if isn't found. """
for index, synonym_list in enumerate(list_of_synonym_lists):
if word in synonym_list:
return index
return -1
# The algorithm
split_text = text.split()
index = 1
while index < len(split_text):
if in_synonyms(synonyms, split_text[index]) != -1: # if word is in any synonyms list
if in_synonyms(synonyms, split_text[index]) == in_synonyms(synonyms, split_text[index-1]):
# if word before is in the same synonyms list as current we delete the current
# one and start over again
del(split_text[index])
index = 1 # restart the algorithm
else:
index += 1 # continue on forward
text = ' '.join(split_text)
This code:
- Creates a list of lists of synonyms
- Iterating through words of text
- If the previous word is in the same list of synonyms as the current one, we delete the current one and restart the algorithm
- Otherwise we will continue moving forward
I haven't tested it yet, but I hope you get the idea.
source to share
If you want to filter out words that are repetitions, tautologies, synonyms of previous words:
filtered = []
previous_word = None
for word in sentence.split(' '):
if previous_word and synonymous(word, previous_word):
continue
else:
filtered.append(word)
previous_word = word
' '.join(filtered)
You can do this in a list comprehension:
words = sentence.split(' ')
new_sentence = ' '.join(word for word, previous in zip(words, [None] + words)
if not synonymous(word, previous))
source to share