Separating suffixes with a dictionary
I need to separate all possible suffixes (about 1000) from a given word. I am thinking about using dict.
That being said, I would have suffixes as keys (and some additional information about suffixes as values needed for the further process). If the longest suffix is 4 letters long, I would look for a dict for all possible combinations. For example: Given the word: "abcdefg", I would look for a dict for "g", "fg", "efg" and "defg".
I've done some research and haven't found many similar uses for dict. Could this be a viable solution or am I missing something? Help a lot appriciated.
source to share
If the suffixes aren't too long, your solution sounds great - it's just a few words for word dictionaries and dictionary searches are fast. I don't think a more complex solution (like using a trie) would be worth it here. You can also use typing instead of a dictionary to remove a suffix, but since you need more information for each suffix, a dictionary seems like a natural choice.
source to share
The easiest (maybe not the fastest) way is to find all the matches in the list. With 1000 elements, you shouldn't have much of a performance issue.
>>> sufx = ['foo', 'bar']
>>> [s for s in sufx if 'bazbar'.endswith(s)]
['bar']
>>>[s for s in sufx if 'bazbaz'.endswith(s)]
[]
>>> [s for s in sufx if 'bazfoo'.endswith(s)]
['foo']
source to share
Easy way to do it (tested in python 3):
>>> from collections import deque
>>> word = "antidisestablishmentarianism"
>>> suffixes = {'ism': 3, 'anism': 6, 'ment': 4, 'arianism': 12}
>>> suffix = deque()
>>> longest = None
>>> for char in reversed(word):
... suffix.appendleft(char)
... suf = ''.join(suffix)
... if suf in suffixes:
... longest = suf
...
>>> longest
'arianism'
source to share
I'm not sure if I understand your usecase correctly. I am assuming that it is about you handling suffixes and are hard to spot.
The typical approach (usually in indexing situations) is to turn your string and treat the suffix as a prefix. Then you can do a simple binary search on a sorted list of your backsigned suffixes (thus prefixes).
source to share
If I understand what you want to do, you should use the re module in the standard library.
The docs are here:
http://docs.python.org/library/re.html#module-re
Here's an example of an adverb here:
http://docs.python.org/library/re.html#finding-all-adverbs
How to store them as keys in a dict seems fine to me. Especially if you want to do some other processing for words that have suffixes that you care about.
source to share