Tibble silently changes the reworked variable variables
If a variable is difftime
included in the header and the specified number of cases is equal to the other variable (s), then the class of the variable is saved.
tibble::tibble(a = c(1,2), b = as.difftime(c(1,2), units = "hours"))
# A tibble: 2 x 2
a b
<dbl> <time>
1 1 1 hours
2 2 1 hours
However, if the specified number of cases in a variable difftime
is the correct factor for the number of cases in another variable, so that the variable is difftime
reworked, then the class of the variable is silently changed to numeric
:
tibble::tibble(a = c(1,2), b = as.difftime(1, units = "hours"))
# A tibble: 2 x 2
a b
<dbl> <dbl>
1 1 1
2 2 1
Is there such a difference in behavior because tidyverse
users are encouraged to use objects period
or those duration
provided lubridate
to indicate the time rather than base R difftime
objects? Or is this an unintentional mistake?
The same problem occurs when used tibble::data_frame
and dplyr::data_frame
although I believe they may be deprecated in the future.
To be clear, the following calls do not change the type of the time type variable:
tibble::tibble(a = c(1,2), b = lubridate::as.period("1H"))
# A tibble: 2 x 2
a b
<dbl> <S4: Period>
1 1 1H 0M 0S
2 2 1H 0M 0S
tibble::tibble(a = c(1,2), b = lubridate::as.duration("1H"))
# A tibble: 2 x 2
a b
<dbl> <S4: Duration>
1 1 3600s (~1 hours)
2 2 3600s (~1 hours)
source to share
The behavior you see has something very peculiar to do with the vector recycling process during data file creation. As you already know, objects passed to a function data.frame
must have the same number of lines. But atomic vectors will be processed an integer number of times, if necessary. This begs the question of why the following doesn't work:
dff <- data.frame(a=c(1,2), b=as.difftime(1, units="hours"))
The above code produces the following error:
Error in data.frame (a = c (1, 2), b = as.difftime (1, units = "hours")): arguments imply different number of rows: 2, 1
It turns out the reason this doesn't work is because the vector of objects is difftime
not recognized as an atomic vector. You can check the following:
is.vector(as.difftime(1, units="hours"))
This returns:
[1] FALSE
As a result, when a function data.frame
tries to recycle a column b
, it first checks to see if the column is actually a vector (s is.vector
). As it returns FALSE
, the recirculation does not continue; and hence an error is returned.
So the next question is, why not just convert column b to as.vector
?
This would be a good idea, expect to as.vector
remove all attributes , including names, for the resulting vector. You can see this with the following:
as.vector(as.difftime(1, units="hours"))
returns:
[1] 1
All properties of the object were difftime
lost during the enforcement process. This makes me think that the function is tibble::data_frame
actually using as.vector
somewhere along the generation process data_frame
. As a result, we see the following behavior:
data_frame(a=c(1,2), b=as.difftime(1, units="hours"))
returns
# A tibble: 2 x 2
a b
<dbl> <dbl>
1 1 1
2 2 1
I am assuming the output is the same as the one obtained by @agstudy: to maintain the object, difftime
you may need to use list
for a column b
like this:
tibble::tibble(a = c(1,2), b = list(as.difftime(1, units = "hours")))
Hope this turns out to be something useful.
source to share
I don't think it tibble
encourages the use lubridate
(even if I recommend that you use it) really refers to date type types, but it's more of a problem with how the vector is created internally when you recycle. In fact, you can reproduce the same return behavior when playing with c
and list
. For example, using c
, you will lose input:
c(as.difftime(c(1), units = "hours"),1)
### Time differences in hours
### [1] 1 1
But the usage list
will keep the type of the difference over time:
list(as.difftime(c(1), units = "hours"),2)
# [[1]]
# Time difference of 1 hours
#
# [[2]]
# [1] 2
When list
applied with tibl, you "preserve" the class type:
tibble::tibble(a = c(1,2),
b = list(as.difftime(c(1), units = "hours")))
# A tibble: 2 x 2
# a b
# <dbl> <list>
# 1 1 <time [1]>
# 2 2 <time [1]>
But this can hardly be manipulated later. Better to use lubridate
in this case.
source to share