Tibble silently changes the reworked variable variables

If a variable is difftime

included in the header and the specified number of cases is equal to the other variable (s), then the class of the variable is saved.

tibble::tibble(a = c(1,2), b = as.difftime(c(1,2), units = "hours"))

# A tibble: 2 x 2
      a       b
  <dbl>  <time>
1     1 1 hours
2     2 1 hours

      

However, if the specified number of cases in a variable difftime

is the correct factor for the number of cases in another variable, so that the variable is difftime

reworked, then the class of the variable is silently changed to numeric

:

tibble::tibble(a = c(1,2), b = as.difftime(1, units = "hours"))

# A tibble: 2 x 2
      a     b
  <dbl> <dbl>
1     1     1
2     2     1

      

Is there such a difference in behavior because tidyverse

users are encouraged to use objects period

or those duration

provided lubridate

to indicate the time rather than base R difftime

objects? Or is this an unintentional mistake?

The same problem occurs when used tibble::data_frame

and dplyr::data_frame

although I believe they may be deprecated in the future.

To be clear, the following calls do not change the type of the time type variable:

tibble::tibble(a = c(1,2), b = lubridate::as.period("1H"))

# A tibble: 2 x 2
      a            b
  <dbl> <S4: Period>
1     1     1H 0M 0S
2     2     1H 0M 0S

tibble::tibble(a = c(1,2), b = lubridate::as.duration("1H"))

# A tibble: 2 x 2
      a                b
  <dbl>   <S4: Duration>
1     1 3600s (~1 hours)
2     2 3600s (~1 hours)

      

+3


source to share


2 answers


The behavior you see has something very peculiar to do with the vector recycling process during data file creation. As you already know, objects passed to a function data.frame

must have the same number of lines. But atomic vectors will be processed an integer number of times, if necessary. This begs the question of why the following doesn't work:

dff <- data.frame(a=c(1,2), b=as.difftime(1, units="hours"))

      

The above code produces the following error:

Error in data.frame (a = c (1, 2), b = as.difftime (1, units = "hours")): arguments imply different number of rows: 2, 1

It turns out the reason this doesn't work is because the vector of objects is difftime

not recognized as an atomic vector. You can check the following:

is.vector(as.difftime(1, units="hours"))

      

This returns:

[1] FALSE

      

As a result, when a function data.frame

tries to recycle a column b

, it first checks to see if the column is actually a vector (s is.vector

). As it returns FALSE

, the recirculation does not continue; and hence an error is returned.

So the next question is, why not just convert column b to as.vector

?

This would be a good idea, expect to as.vector

remove all attributes , including names, for the resulting vector. You can see this with the following:



as.vector(as.difftime(1, units="hours"))

      

returns:

[1] 1

      

All properties of the object were difftime

lost during the enforcement process. This makes me think that the function is tibble::data_frame

actually using as.vector

somewhere along the generation process data_frame

. As a result, we see the following behavior:

data_frame(a=c(1,2), b=as.difftime(1, units="hours"))

      

returns

# A tibble: 2 x 2
      a     b
  <dbl> <dbl>
1     1     1
2     2     1

      

I am assuming the output is the same as the one obtained by @agstudy: to maintain the object, difftime

you may need to use list

for a column b

like this:

tibble::tibble(a = c(1,2), b = list(as.difftime(1, units = "hours")))

      

Hope this turns out to be something useful.

+2


source


I don't think it tibble

encourages the use lubridate

(even if I recommend that you use it) really refers to date type types, but it's more of a problem with how the vector is created internally when you recycle. In fact, you can reproduce the same return behavior when playing with c

and list

. For example, using c

, you will lose input:

c(as.difftime(c(1), units = "hours"),1)
### Time differences in hours
### [1] 1 1

      

But the usage list

will keep the type of the difference over time:

list(as.difftime(c(1), units = "hours"),2)

# [[1]]
# Time difference of 1 hours
# 
# [[2]]
# [1] 2

      



When list

applied with tibl, you "preserve" the class type:

tibble::tibble(a = c(1,2), 
               b = list(as.difftime(c(1), units = "hours")))

# A tibble: 2 x 2
# a          b
# <dbl>     <list>
#   1     1 <time [1]>
#   2     2 <time [1]>

      

But this can hardly be manipulated later. Better to use lubridate

in this case.

0


source







All Articles