Weakening Dutch words with the Kraay-Pohlman algorithm

I am trying to stop Dutch words in corpus in P. I found the SnowballC package, but for Dutch it doesn't work well. For example:

wordStem(c("huis", "huizen", "huisje", "huisjes"), language = "porter")
[1] "huis"    "huiz"    "huisj"   "huisjes"

wordStem(c("huis", "huizen", "huisje", "huisjes"), language = "dutch")
[1] "hui"    "huizen" "huisj"  "huisj" 

      

After some searching, I found that Kraaij-Pohlmann's algorithm might be more suitable for the Dutch. Is there a way to implement this in R? So far, I have not been able to find a package / script that does this. More tips and ideas are also welcome!

+3
r stemming


source to share


No one has answered this question yet

Check out similar questions:

629
data.table vs dplyr: can something do good and other do bad or bad?
nineteen
What algorithm does R use to compute the mean?
3
all possible completions of a text word (biomedical) phrase
2
How do I perform suppression / lemming in GAE search app?
1
Bubble request in Solr
1
Comprehensive English Words Using Lucene 6
0
Filtering non-english words from corpus using 'textcat'
0
R: Text Analysis - tm Package - stemComplete error
0
Simple algorithm using String for input
-3
Is there a standard function for binary search in ordered word list



All Articles
Loading...
X
Show
Funny
Dev
Pics