Error in enc2utf8 (x): argumemt is not a character vector

Question

Error in enc2utf8 (x): argumemt is not a character vector

Error in enc2utf8(x) : argumemt is not a character vector

is the error I get when I try to run the code below in R 3.1.2. can someone please help me understand if i am missing something?

Used OS - Windows

#Text Cleaning: tm Code
  clean<-function(text){
  library(NLP)
  library(tm)
  sample<- Corpus(VectorSource(text),readerControl=list(language="english"))
  sample<- tm_map(sample, function(x) iconv(enc2utf8(x), sub = "bytes"))
  sample<-tm_map(sample,removePunctuation)
  sample <- tm_map(sample, stripWhitespace)
  sample<-tm_map(sample,removeNumbers)
  sample<-tm_map(sample,removeWords,stopwords('smart'))
  sample <- tm_map(sample, stripWhitespace)
  sample <- tm_map(sample, stripWhitespace)
  dtm <- DocumentTermt(sample[1:3])Matrix(sample)
  return(list(sample,dtm))
  }
 fileName <- 'input.txt'
 test = readChar(fileName, file.info(fileName)$size)
 clean (test)

+3

r text-mining

dagan Dec 15. 14 at 5:54

source to share

2 answers

Hi A small change in below line 2 may solve your problem

sample <- VCorpus (VectorSource (text), readerControl = list (language = "english")) sample <- tm_map (sample, content_transformer (function (x) iconv (enc2utf8 (x), sub = "bytes"))))

0

SPRASHA6 May 26 '19 @ 4:14 am

source to share

Sven Hohenstein · Accepted Answer · 2014-12-15T09:08:18+0000

You must refer to the content

corpus, i.e. character vector in sample$content

:

tm_map(sample, function(x) iconv(enc2utf8(x$content), sub = "bytes"))

Here I replaced enc2utf8(x)

with enc2utf8(x$content)

.

Error in enc2utf8 (x): argumemt is not a character vector

More articles: