Data.frame (cbind ...) versus data.frame (...) in R
I want to know what is the difference between using
data.frame(a,b,c,y)
and
data.frame(cbind(a,b,c,y))
I have three vectors a, b, c that contain factors (text) and one (y) that stores numbers (numbers).
Depending on the notation, I get different answers when running this model
model.glm <- glm(y ~ a * b * c, data=blabla, family=poisson)
I'm guessing because one of them makes factors "no factors", but I'm not sure. Which way is correct?
source to share
cbind
Returns by default matrix
, which can only have one data type. Mixed data types (such as numeric and character) are usually cast to a character. For example:
a <- 1:3
b <- c("a", "b", "c")
cb <- cbind(a,b)
cb
a b
[1,] "1" "a"
[2,] "2" "b"
[3,] "3" "c"
class(cb)
[1] "matrix"
typeof(cb)
[1] "character"
When you pass this value in data.frame
, by default the characters are converted to factors ( StringsAsFactors = TRUE
; set to parameter FALSE
to suppress this behavior), which are basically entire string representations.
df <- data.frame(cb)
typeof(df$a)
[1] "integer"
typeof(df$b)
[1] "integer"
class(df$a)
[1] "factor"
class(df$b)
[1] "factor"
I assume this is not the behavior you want, and since it data.frame
would be nice cbind
to you while keeping your original types (except for converting strings to factors, which I said could be suppressed), I would stick with the simpler construction data.frame(a,b)
...
source to share