Error in enc2utf8 (x): argumemt is not a character vector

Error in enc2utf8(x) : argumemt is not a character vector

is the error I get when I try to run the code below in R 3.1.2. can someone please help me understand if i am missing something?

Used OS - Windows

#Text Cleaning: tm Code
  clean<-function(text){
  library(NLP)
  library(tm)
  sample<- Corpus(VectorSource(text),readerControl=list(language="english"))
  sample<- tm_map(sample, function(x) iconv(enc2utf8(x), sub = "bytes"))
  sample<-tm_map(sample,removePunctuation)
  sample <- tm_map(sample, stripWhitespace)
  sample<-tm_map(sample,removeNumbers)
  sample<-tm_map(sample,removeWords,stopwords('smart'))
  sample <- tm_map(sample, stripWhitespace)
  sample <- tm_map(sample, stripWhitespace)
  dtm <- DocumentTermt(sample[1:3])Matrix(sample)
  return(list(sample,dtm))
  }
 fileName <- 'input.txt'
 test = readChar(fileName, file.info(fileName)$size)
 clean (test)

      

+3


source to share


2 answers


You must refer to the content

corpus, i.e. character vector in sample$content

:

tm_map(sample, function(x) iconv(enc2utf8(x$content), sub = "bytes"))

      



Here I replaced enc2utf8(x)

with enc2utf8(x$content)

.

+3


source


Hi A small change in below line 2 may solve your problem



sample <- VCorpus (VectorSource (text), readerControl = list (language = "english")) sample <- tm_map (sample, content_transformer (function (x) iconv (enc2utf8 (x), sub = "bytes"))))

0


source







All Articles