Why does Porter Stemmer give a string that can be restored again?

stem ('apples') = 'apple'
Stem ('apple') = 'applied'
stock ('Appl') = '' Appl

Isn't this a flaw in the narrowing algorithm?

(this is used by the Porter preemptive algorithm )

+2


source to share


2 answers


It looks more like a bug in the implementation of the algorithm you are using.



When I follow the steps in the original algorithm (from the page you linked to), the final "s" from "apples" are removed in step 1a and "e" in step 5a, so the stem of "apples" is also "attached".

+1


source


I found a dictionary-assisted implementation of the porter streamer algorithm here http://preciselyconcise.com/apis_and_installations/smart_stemmer.php .



This API was very easy to use and the original words were corrected for spelling errors. I would advise you to use this stem as this API has an autoregular variant of the word.

0


source







All Articles