Strange behavior for filtering in data.table
I came across a weird data table behavior i
that returns a row with NA
where I would expect an empty data table. Cm.:
a = data.table(a = 1, d = NA)
a[!is.na(a) & d == "3"]
# a d
# 1: NA NA
As a result, I would expect an empty data table here. Compare with:
a = data.table(a = c(1,2), d = c(NA,3))
a[!is.na(a) & d == "3"]
# a d
# 1: 2 3
This does not create an extra row of values NA
. Is this an error in data.table
, or there is some logic behind this behavior is that someone can explain?
source to share
Thanks for the ping @SergiiZaskaleta. I forgot to update this question, but it was fixed a while ago with this commit .
From NEWS :
- Subsets using boolean expressions in
i
never return all- stringsNA
. The boundary register is nowDT[NA]
fixed, # 1252 . Thanks to @sergiizaskaleta.
source to share
Don't know if this is a bug or not, but it looks like it has something to do with the type of your d variable.
a = data.table(a = 1, d = NA)
str(a)
# Classes ‘data.table’ and 'data.frame': 1 obs. of 2 variables:
# $ a: num 1
# $ d: logi NA
# - attr(*, ".internal.selfref")=<externalptr>
a[!is.na(a) & d == "3"] # this returns NAs
# a d
# 1: NA NA
a[!is.na(a) & !is.na(d)] # this returns nothing
# Empty data.table (0 rows) of 2 cols: a,d
This also works:
a = data.table(a = 1, d = 4)
str(a)
# Classes ‘data.table’ and 'data.frame': 1 obs. of 2 variables:
# $ a: num 1
# $ d: num 4
# - attr(*, ".internal.selfref")=<externalptr>
a[!is.na(a) & d == "3"]
# Empty data.table (0 rows) of 2 cols: a,d
It looks like if a variable is of a boolean type it cannot be compared to another type and returns NA. However, with the dplyr package, it works:
library(dplyr)
a = data.table(a = 1, d = NA)
a %>% filter(!is.na(a) & d == "3")
# Empty data.table (0 rows) of 2 cols: a,d
It's the same with the subset command:
subset(a, !is.na(a) & d == "3")
# Empty data.table (0 rows) of 2 cols: a,d
source to share