Data.frame (cbind ...) versus data.frame (...) in R

I want to know what is the difference between using

data.frame(a,b,c,y)

      

and

data.frame(cbind(a,b,c,y))

      

I have three vectors a, b, c that contain factors (text) and one (y) that stores numbers (numbers).

Depending on the notation, I get different answers when running this model

model.glm <- glm(y ~ a * b * c, data=blabla, family=poisson)

      

I'm guessing because one of them makes factors "no factors", but I'm not sure. Which way is correct?

+3


source to share


2 answers


cbind

Returns by default matrix

, which can only have one data type. Mixed data types (such as numeric and character) are usually cast to a character. For example:

a <- 1:3
b <- c("a", "b", "c")
cb <- cbind(a,b)
cb
     a   b
[1,] "1" "a"
[2,] "2" "b"
[3,] "3" "c"
class(cb)
[1] "matrix"
typeof(cb)
[1] "character"

      

When you pass this value in data.frame

, by default the characters are converted to factors ( StringsAsFactors = TRUE

; set to parameter FALSE

to suppress this behavior), which are basically entire string representations.



df <- data.frame(cb)
typeof(df$a)
[1] "integer"
typeof(df$b)
[1] "integer"
class(df$a)
[1] "factor"
class(df$b)
[1] "factor"

      

I assume this is not the behavior you want, and since it data.frame

would be nice cbind

to you while keeping your original types (except for converting strings to factors, which I said could be suppressed), I would stick with the simpler construction data.frame(a,b)

...

+9


source


cbind(a,b,c,y)

returns a matrix that does not allow multiple data types. So if, say, a, b and c are numeric and y is a factor, then data.frame(cbind(a,b,c,y))

it only contains factors.



without cbind()

, a, b and c are not converted to factors.

+1


source







All Articles