Extract ngrams with R
I am trying to extract 3grams from nirvana text, so for tfis I am using package ngramrr
.
require(ngramrr)
require(tm)
require(magrittr)
nirvana <- c("hello hello hello how low", "hello hello hello how low",
"hello hello hello how low", "hello hello hello",
"with the lights out", "it less dangerous", "here we are now", "entertain us",
"i feel stupid", "and contagious", "here we are now", "entertain us",
"a mulatto", "an albino", "a mosquito", "my libido", "yeah", "hey yay")
ngramrr(nirvana[1], ngmax = 3)
Corpus(VectorSource(nirvana))
I get this result:
[1] "hello" "hello" "hello" "how" "low" "hello hello" "hello hello"
[8] "hello how" "how low" "hello hello hello" "hello hello how" "hello how low"
I would like to know how can I do to build TermDocumentMatrix
where the terms are a list of trigrams.
thank
+3
source to share