Concurrent ordering, sorting rows and columns with data.table

I am trying to do multiple steps in one line in R to select a value from data.table (dt) with multiple criteria.

For example:

set.seed(123)
dt <- data.table(id = rep(letters[1:2],2),
             time = rnorm(4),
             value = rnorm(4)*100)

#    id        time      value
# 1:  a -0.56047565   12.92877
# 2:  b -0.23017749  171.50650
# 3:  a  1.55870831   46.09162
# 4:  b  0.07050839 -126.50612

# Now I want to select the last (maximum time) value from id == "a"
# My pseudo data.table code looks like this

dt[order(time) & id == "a" & .N, value]
# [1] 12.92877 46.09162  

      

Instead of getting two values, I only want to get the last value (which has a higher time value).

If I do it step by step, it works:

dt <- dt[order(time) & id == "a"]
dt[.N, value]
# [1] 46.09162

      

Bonus:

How can I order a data table without copying the data table. ie

dt <- dt[order(time)]

      

without <-

. Similar to the :=

-operator in, for example, dt[, new_val := value*2]

which creates a new variable without copying the entire data table.

Thanks, any idea is much appreciated!

+3


source to share


2 answers


For the first question try

dt[id == "a", value[which.max(time)]]
## [1] 46.09162

      

For a bonus question, try a setorder

function
that will order your data locally (you can also order in descending order by adding -

before time

)



setorder(dt, time)
dt
#    id        time      value
# 1:  a -0.56047565   12.92877
# 2:  b -0.23017749  171.50650
# 3:  b  0.07050839 -126.50612
# 4:  a  1.55870831   46.09162

      

In addition, if you already order your data by time, you can do both - order by link and select value

by condition in one line

setorder(dt, time)[id == "a", value[.N]]

      

+7


source


I know this is an old question, but I would like to add something. Having a similar problem, I stumbled upon this question, and while David Arenburg's answer provides a solution to this exact question, I had problems with this when trying to replace / overwrite values ​​from this filtered and ordered data table, so here is an alternative way that also allows for calls <-

directly to the filtered and ordered data.tabe.

The key is that data.table allows multiple concatenation []

with each other.

Example:

dt[id=="a", ][order(time), ][length(value), "value"] <- 0

      



This also works for multiple positions, it just provides a suitable vector as the replacement value.

Note, however, that .N

which is a list object must be replaced, eg. column length, because data.table expects an integer at that position in i

, and the column you want to select in j

must be wrapped ""

.

I found this to be a more intuitive way, and it allows not only filtering the data table but manipulating its values ​​without worrying about temporary tables.

0


source







All Articles