Strange behavior for filtering in data.table

I came across a weird data table behavior i

that returns a row with NA

where I would expect an empty data table. Cm.:

a = data.table(a = 1, d = NA) 
a[!is.na(a) & d == "3"] 
#     a  d
# 1: NA NA

      

As a result, I would expect an empty data table here. Compare with:

a = data.table(a = c(1,2), d = c(NA,3))
a[!is.na(a) & d == "3"] 
#    a d
# 1: 2 3

      

This does not create an extra row of values NA

. Is this an error in data.table

, or there is some logic behind this behavior is that someone can explain?

+3


source to share


2 answers


Thanks for the ping @SergiiZaskaleta. I forgot to update this question, but it was fixed a while ago with this commit .

From NEWS :



  1. Subsets using boolean expressions in i

    never return all- strings NA

    . The boundary register is now DT[NA]

    fixed, # 1252 . Thanks to @sergiizaskaleta.
+1


source


Don't know if this is a bug or not, but it looks like it has something to do with the type of your d variable.

a = data.table(a = 1, d = NA) 
str(a)
# Classes ‘data.table’ and 'data.frame':    1 obs. of  2 variables:
#  $ a: num 1
#  $ d: logi NA
#  - attr(*, ".internal.selfref")=<externalptr> 

a[!is.na(a) & d == "3"] # this returns NAs
#     a  d
# 1: NA NA

a[!is.na(a) & !is.na(d)] # this returns nothing
# Empty data.table (0 rows) of 2 cols: a,d

      

This also works:

a = data.table(a = 1, d = 4) 
str(a)
# Classes ‘data.table’ and 'data.frame':    1 obs. of  2 variables:
#  $ a: num 1
#  $ d: num 4
#  - attr(*, ".internal.selfref")=<externalptr> 

a[!is.na(a) & d == "3"]
#     Empty data.table (0 rows) of 2 cols: a,d

      



It looks like if a variable is of a boolean type it cannot be compared to another type and returns NA. However, with the dplyr package, it works:

library(dplyr)

a = data.table(a = 1, d = NA) 
a %>% filter(!is.na(a) & d == "3")
# Empty data.table (0 rows) of 2 cols: a,d

      

It's the same with the subset command:

subset(a, !is.na(a) & d == "3")
# Empty data.table (0 rows) of 2 cols: a,d

      

+1


source







All Articles