Na.trim on a specific column with data.table

I am trying to use na.trim

from a package zoo

in a single column object data.table

. This is what my data looks like:

DT <- data.table(id=c(rep("a",3),rep("b",3)),
                 col1=c(NA,1,2,NA,3,NA),col2=c(NA,NA,5,NA,NA,NA))
   id col1 col2
1:  a   NA   NA
2:  a    1   NA
3:  a    2    5
4:  b   NA   NA
5:  b    3   NA
6:  b   NA   NA

      

I would like to remove rows with leading NA using na.trim

and group by id. Here is the result I am expecting:

   id col1 col2
1:  a    1   NA
2:  a    2    5
3:  b    3   NA
4:  b   NA   NA

      

Here is what I have tried so far. This removes the leading NA but removes col2:

DT[,na.trim(col1),by=id]
   id V1
1:  a  1
2:  a  2
3:  b  3

      

This also doesn't work:

DT[,.SD[na.trim(col1)],by=id]
   id col1 col2
1:  a   NA   NA
2:  a    1   NA
3:  b   NA   NA

      

+3


source to share


2 answers


Possible solution without using the zoo

-package:

DT[DT[, .I[!!cumsum(!is.na(col1))], by = id]$V1]

      

You get:

   id col1 col2
1:  a    1   NA
2:  a    2    5
3:  b    3   NA
4:  b   NA   NA

      

What does it do:

  • With, DT[, .I[!!cumsum(!is.na(col1))], id]$V1

    you create a vector of diamonds that you want to keep. By using !!cumsum(!is.na(col1))

    , you make sure that only leading missing values ​​are omitted col1

    .
  • Then you use this vector for a subset of the data table.
  • !!cumsum(!is.na(col1))

    does the same as cumsum(!is.na(col1))!=0

    . Usage !!

    converts all numbers above zero to TRUE

    and all zeros to FALSE

    .
  • .I

    is optional, you can also use:, DT[DT[, !!cumsum(!is.na(col1)), by = id]$V1]

    which multiplies data.table with a boolean vector.



Two alternatives with cummax

by @lmo from comments:

# alternative 1:
DT[DT[, !!(cummax(!is.na(col1))), by = id]$V1]

# alternative 2:
DT[as.logical(DT[, cummax(!is.na(col1)), by = id]$V1)]

      

Another alternative to @jogo:

DT[, .SD[!!cumsum(!is.na(col1))], by = id]

      

Another alternative to @Frank:

DT[, .SD[ rleid(col1) > 1L | !is.na(col1) ], by = id]

      

+4


source


na.trim

will be used in the same way as with data.table. See ?na.trim

for more information on its arguments.

DT[, na.trim(.SD, sides = "left", is.na = "all"), by = id]

      

giving:

   id col1 col2
1:  a    1   NA
2:  a    2    5
3:  b    3   NA
4:  b   NA   NA

      



ADDED:

In the comments, the poster clarified that na.trim

only one NN columns should work. In this case, add a column with row numbers, .I and after including the subset na.trim

using those row numbers.

DT[DT[, na.trim(data.table(col1, .I), "left"), by = id]$.I, ]

      

+4


source







All Articles