Weakening Dutch words with the Kraay-Pohlman algorithm

I am trying to stop Dutch words in corpus in P. I found the SnowballC package, but for Dutch it doesn't work well. For example:

wordStem(c("huis", "huizen", "huisje", "huisjes"), language = "porter")
[1] "huis"    "huiz"    "huisj"   "huisjes"

wordStem(c("huis", "huizen", "huisje", "huisjes"), language = "dutch")
[1] "hui"    "huizen" "huisj"  "huisj" 

      

After some searching, I found that Kraaij-Pohlmann's algorithm might be more suitable for the Dutch. Is there a way to implement this in R? So far, I have not been able to find a package / script that does this. More tips and ideas are also welcome!

+3


source to share





All Articles