Na.trim on a specific column with data.table
I am trying to use na.trim
from a package zoo
in a single column object data.table
. This is what my data looks like:
DT <- data.table(id=c(rep("a",3),rep("b",3)),
col1=c(NA,1,2,NA,3,NA),col2=c(NA,NA,5,NA,NA,NA))
id col1 col2
1: a NA NA
2: a 1 NA
3: a 2 5
4: b NA NA
5: b 3 NA
6: b NA NA
I would like to remove rows with leading NA using na.trim
and group by id. Here is the result I am expecting:
id col1 col2
1: a 1 NA
2: a 2 5
3: b 3 NA
4: b NA NA
Here is what I have tried so far. This removes the leading NA but removes col2:
DT[,na.trim(col1),by=id]
id V1
1: a 1
2: a 2
3: b 3
This also doesn't work:
DT[,.SD[na.trim(col1)],by=id]
id col1 col2
1: a NA NA
2: a 1 NA
3: b NA NA
source to share
Possible solution without using the zoo
-package:
DT[DT[, .I[!!cumsum(!is.na(col1))], by = id]$V1]
You get:
id col1 col2
1: a 1 NA
2: a 2 5
3: b 3 NA
4: b NA NA
What does it do:
- With,
DT[, .I[!!cumsum(!is.na(col1))], id]$V1
you create a vector of diamonds that you want to keep. By using!!cumsum(!is.na(col1))
, you make sure that only leading missing values ββare omittedcol1
. - Then you use this vector for a subset of the data table.
-
!!cumsum(!is.na(col1))
does the same ascumsum(!is.na(col1))!=0
. Usage!!
converts all numbers above zero toTRUE
and all zeros toFALSE
. -
.I
is optional, you can also use:,DT[DT[, !!cumsum(!is.na(col1)), by = id]$V1]
which multiplies data.table with a boolean vector.
Two alternatives with cummax
by @lmo from comments:
# alternative 1:
DT[DT[, !!(cummax(!is.na(col1))), by = id]$V1]
# alternative 2:
DT[as.logical(DT[, cummax(!is.na(col1)), by = id]$V1)]
Another alternative to @jogo:
DT[, .SD[!!cumsum(!is.na(col1))], by = id]
Another alternative to @Frank:
DT[, .SD[ rleid(col1) > 1L | !is.na(col1) ], by = id]
source to share
na.trim
will be used in the same way as with data.table. See ?na.trim
for more information on its arguments.
DT[, na.trim(.SD, sides = "left", is.na = "all"), by = id]
giving:
id col1 col2
1: a 1 NA
2: a 2 5
3: b 3 NA
4: b NA NA
ADDED:
In the comments, the poster clarified that na.trim
only one NN columns should work. In this case, add a column with row numbers, .I and after including the subset na.trim
using those row numbers.
DT[DT[, na.trim(data.table(col1, .I), "left"), by = id]$.I, ]
source to share