Separating suffixes with a dictionary

I need to separate all possible suffixes (about 1000) from a given word. I am thinking about using dict.

That being said, I would have suffixes as keys (and some additional information about suffixes as values ​​needed for the further process). If the longest suffix is ​​4 letters long, I would look for a dict for all possible combinations. For example: Given the word: "abcdefg", I would look for a dict for "g", "fg", "efg" and "defg".

I've done some research and haven't found many similar uses for dict. Could this be a viable solution or am I missing something? Help a lot appriciated.

+3


source to share


5 answers


If the suffixes aren't too long, your solution sounds great - it's just a few words for word dictionaries and dictionary searches are fast. I don't think a more complex solution (like using a trie) would be worth it here. You can also use typing instead of a dictionary to remove a suffix, but since you need more information for each suffix, a dictionary seems like a natural choice.



+3


source


The easiest (maybe not the fastest) way is to find all the matches in the list. With 1000 elements, you shouldn't have much of a performance issue.



>>> sufx = ['foo', 'bar']
>>> [s for s in sufx if 'bazbar'.endswith(s)]
['bar']
>>>[s for s in sufx if 'bazbaz'.endswith(s)]
[]
>>> [s for s in sufx if 'bazfoo'.endswith(s)]
['foo']

      

+1


source


See Time Complexity dict. The lookup time for a dict is pretty fast (O (1) on average!). For this implementation, your average time complexity to find the longest suffix will be O (k ^ 2), where k is your word length. That's k ^ 2 due to the operation ''.join

(a similar O (n) operation such as reverse or string sort would be required since strings do not support the O (1) appendleft operation).

Easy way to do it (tested in python 3):

>>> from collections import deque
>>> word = "antidisestablishmentarianism"
>>> suffixes = {'ism': 3, 'anism': 6, 'ment': 4, 'arianism': 12}
>>> suffix = deque()
>>> longest = None
>>> for char in reversed(word):
...     suffix.appendleft(char)
...     suf = ''.join(suffix)
...     if suf in suffixes:
...         longest = suf
...
>>> longest
'arianism'

      

+1


source


I'm not sure if I understand your usecase correctly. I am assuming that it is about you handling suffixes and are hard to spot.

The typical approach (usually in indexing situations) is to turn your string and treat the suffix as a prefix. Then you can do a simple binary search on a sorted list of your backsigned suffixes (thus prefixes).

0


source


If I understand what you want to do, you should use the re module in the standard library.

The docs are here:

http://docs.python.org/library/re.html#module-re

Here's an example of an adverb here:

http://docs.python.org/library/re.html#finding-all-adverbs

How to store them as keys in a dict seems fine to me. Especially if you want to do some other processing for words that have suffixes that you care about.

0


source







All Articles