Unexpected conversion to symbols instead of factors in data frames and matrices
I'm not a novice R user, but the following is the most confusing.
I have a data frame (although the problem is equally present for matrices) of categorical variables taking values + 1 / -1 that I would like to convert to factors.
mat <- matrix(sample(c(-1, +1), 16, replace = T), nrow = 4)
mat <- data.frame(mat)
However, using
mat <- apply(mat, 2, factor)
converts integers to symbols instead of factors:
> mat
[,1] [,2] [,3] [,4]
[1,] "-1" "1" "-1" "1"
[2,] "-1" "-1" "-1" "-1"
[3,] "-1" "1" "1" "1"
[4,] "-1" "-1" "1" "1"
Perhaps along the same lines (and I had this kind of problem with some of my other data), trying to convert character names to matrices and data frames to factors results in more confusing behavior:
mat2 <- matrix(sample(letters, 16, replace = T), nrow = 4)
> mat2
[,1] [,2] [,3] [,4]
[1,] "x" "m" "r" "e"
[2,] "u" "r" "b" "p"
[3,] "j" "p" "h" "j"
[4,] "k" "s" "e" "x"
mat2[,1] <- factor(mat2[,1])
> mat2
[,1] [,2] [,3] [,4]
[1,] "4" "m" "r" "e"
[2,] "3" "r" "b" "p"
[3,] "1" "p" "h" "j"
[4,] "2" "s" "e" "x"
any help or clarification would be appreciated.
source to share
Always remember that data frames are lists, so working on columns is the same as iterating over list items. I think you may have intended to do something more:
mat[] <- lapply(mat,factor)
or that:
as.data.frame(lapply(mat,factor))
Although, even here, note that the levels of each factor are not the same!
source to share