Weakening Dutch words with the Kraay-Pohlman algorithm
I am trying to stop Dutch words in corpus in P. I found the SnowballC package, but for Dutch it doesn't work well. For example:
wordStem(c("huis", "huizen", "huisje", "huisjes"), language = "porter")
[1] "huis" "huiz" "huisj" "huisjes"
wordStem(c("huis", "huizen", "huisje", "huisjes"), language = "dutch")
[1] "hui" "huizen" "huisj" "huisj"
After some searching, I found that Kraaij-Pohlmann's algorithm might be more suitable for the Dutch. Is there a way to implement this in R? So far, I have not been able to find a package / script that does this. More tips and ideas are also welcome!
+3
source to share
No one has answered this question yet
Check out similar questions: