Instead of replacing columns of a dataframe with a factor object, is character data inserted instead?

While trying to plot my data, I encountered unexpected behavior that caused my groups to be incorrectly regrouped and mislabeled.

In short, storing a factor object in multiple columns of a dataframe causes it to coerce a character rather than a factor. This is similar to the previously mentioned question, but I still don't understand why this is happening.

# x is a factor
(x = factor(c("red", "blue", "green")))
class(x)

# make a data frame
frame = data.frame("y"=1:3, "z"=1:3)

# replacing one column at a time yields a factor
frame[,"y"] = x; class(frame[,"y"])
frame[,"z"] = x; class(frame[,"z"])

# however, replacing >1 column at a time yields a character
frame[,c("y", "z")] = x
class(frame$y); class(frame$z)

      

Factors in R tend to cause me the most heartburn! Order, combination of numeric value and character level, general indecision ... Anyway, I'm sure this is something I don't understand about the specific properties of data frames. Your help is appreciated!

+3


source to share


1 answer


So the problem is in the function [<-.data.frame

that gets executed when you do the assignment like

 frame[,c("y", "z")] = x

      

The problem is that when specifying more than one column like you have, if the new value is not a list, it converts it to a matrix with the correct number of rows and columns and then splits it into a list. So the problem with factors is that you cannot store them in a matrix. you can see this if you try

matrix(x, nrow=3, ncol=2)

      



Again, this casting happens because you are specifying more than one column and the new value is not a list. So one way is to display the list as a new value instead.

frame[,c("y", "z")] <- list(x)

      

So, it's a little annoying that factors are so afraid of matrices, but once you get the hang of them, they really are a powerful feature of R. Don't despair!

+3


source







All Articles