Weakening Dutch words with the Kraay-Pohlman algorithm

Question

Weakening Dutch words with the Kraay-Pohlman algorithm

I am trying to stop Dutch words in corpus in P. I found the SnowballC package, but for Dutch it doesn't work well. For example:

wordStem(c("huis", "huizen", "huisje", "huisjes"), language = "porter")
[1] "huis"    "huiz"    "huisj"   "huisjes"

wordStem(c("huis", "huizen", "huisje", "huisjes"), language = "dutch")
[1] "hui"    "huizen" "huisj"  "huisj"

After some searching, I found that Kraaij-Pohlmann's algorithm might be more suitable for the Dutch. Is there a way to implement this in R? So far, I have not been able to find a package / script that does this. More tips and ideas are also welcome!

+3

r stemming

Charlotte June 25. 17 at 11:58

source to share

No one has answered this question yet

Check out similar questions:

629

data.table vs dplyr: can something do good and other do bad or bad?

nineteen

What algorithm does R use to compute the mean?

3

all possible completions of a text word (biomedical) phrase

2

How do I perform suppression / lemming in GAE search app?

1

Bubble request in Solr

1

Comprehensive English Words Using Lucene 6

0

Filtering non-english words from corpus using 'textcat'

0

R: Text Analysis - tm Package - stemComplete error

0

Simple algorithm using String for input

-3

Is there a standard function for binary search in ordered word list

Weakening Dutch words with the Kraay-Pohlman algorithm

More articles: