Concurrent ordering, sorting rows and columns with data.table
I am trying to do multiple steps in one line in R to select a value from data.table (dt) with multiple criteria.
For example:
set.seed(123)
dt <- data.table(id = rep(letters[1:2],2),
time = rnorm(4),
value = rnorm(4)*100)
# id time value
# 1: a -0.56047565 12.92877
# 2: b -0.23017749 171.50650
# 3: a 1.55870831 46.09162
# 4: b 0.07050839 -126.50612
# Now I want to select the last (maximum time) value from id == "a"
# My pseudo data.table code looks like this
dt[order(time) & id == "a" & .N, value]
# [1] 12.92877 46.09162
Instead of getting two values, I only want to get the last value (which has a higher time value).
If I do it step by step, it works:
dt <- dt[order(time) & id == "a"]
dt[.N, value]
# [1] 46.09162
Bonus:
How can I order a data table without copying the data table. ie
dt <- dt[order(time)]
without <-
. Similar to the :=
-operator in, for example, dt[, new_val := value*2]
which creates a new variable without copying the entire data table.
Thanks, any idea is much appreciated!
source to share
For the first question try
dt[id == "a", value[which.max(time)]]
## [1] 46.09162
For a bonus question, try a setorder
function that will order your data locally (you can also order in descending order by adding -
before time
)
setorder(dt, time)
dt
# id time value
# 1: a -0.56047565 12.92877
# 2: b -0.23017749 171.50650
# 3: b 0.07050839 -126.50612
# 4: a 1.55870831 46.09162
In addition, if you already order your data by time, you can do both - order by link and select value
by condition in one line
setorder(dt, time)[id == "a", value[.N]]
source to share
I know this is an old question, but I would like to add something. Having a similar problem, I stumbled upon this question, and while David Arenburg's answer provides a solution to this exact question, I had problems with this when trying to replace / overwrite values from this filtered and ordered data table, so here is an alternative way that also allows for calls <-
directly to the filtered and ordered data.tabe.
The key is that data.table allows multiple concatenation []
with each other.
Example:
dt[id=="a", ][order(time), ][length(value), "value"] <- 0
This also works for multiple positions, it just provides a suitable vector as the replacement value.
Note, however, that .N
which is a list object must be replaced, eg. column length, because data.table expects an integer at that position in i
, and the column you want to select in j
must be wrapped ""
.
I found this to be a more intuitive way, and it allows not only filtering the data table but manipulating its values without worrying about temporary tables.
source to share