Invalid units displayed in data.table with POSIXct arithmetic

When the duration is calculated in the data table. (v1.9.2), invalid units can be printed using POSIXct arithmetic. The first units seem to be selected.

require("data.table")
dt <- data.table(id=c(1,1,2,2), 
                  event=rep(c("start", "end"), times=2), 
                  time=c(as.POSIXct(c("2014-01-31 06:05:30", 
                                      "2014-01-31 06:45:30", 
                                      "2014-01-31 08:10:00", 
                                      "2014-01-31 09:30:00"))))
dt$time[2] - dt$time[1]  # in minutes
dt$time[4] - dt$time[3]  # in hours
dt[ , max(time) - min(time), by=id]  # wrong units printed for id 2

      

I understand that one of them is the correct way to do this to get the expected behavior, but wanted to communicate this. Not sure if this is really a data.table issue or a POSIXct issue.

dt[ , difftime(max(time), min(time), units="mins"), by=id]  # both in mins
dt[ , difftime(max(time), min(time), units="hours"), by=id]  # both in hours

      

+3


source to share


3 answers


You will get the expected result if you do

dt[ , list(c(max(time) - min(time)),attr(max(time) - min(time),"units")), by=id]

      

Putting it out c()

during the operation excludes the attribute, so you just get the number and then explicitly ask for the attribute "units"

, since the other list item itself gets the correct one in its column. The reason it doesn't work, but it doesn't, is because it data.table

doesn't parse attributes like other columns, and this is how POSIXct returns ones.




From Matt:

+1 Just add a small speed improvement to save max(time)-min(time)

twice:

dt[ , list(c(d<-max(time) - min(time)), attr(d,"units")), by=id]
   id        V1    V2
1:  1 40.000000  mins
2:  2  1.333333 hours

      

At least to start with, I think we will add a check for inconsistent attributes in the group results and then issue a warning / error. So this solution (or the question in the question) will probably be needed anyway.

+3


source


This can be seen as a statement error because your table (automatically) displays the numerical equivalent of difftime, but you don't specify which units to display the response. Most of the time, when you want to export / display diffftime values, the required units must be explicitly converted to numeric. For example.



dt[ , as.numeric( max(time) - min(time), units="hours" ), by=id]

      

+2


source


Forced blocks are the way to go until # 761 is committed . Here's another option:

dt[ , difftime(max(time), min(time), units = 'mins'), by = id]
#    id      V1
# 1:  1 40 mins
# 2:  2 80 mins

      

This allows you to save class

to output ( difftime

) if you want.

Moreover, I find R's behavior to radically change the content of an object difftime

based on an attribute units

rather odd. Elsewhere in R, this conversion is simply handled by the method print

, while the stored object value remains consistent.

0


source







All Articles