R by empty index for vector inconsistent behavior

Consider removing these elements from a vector that matches a specific set if criteria. The expected behavior is to remove those that match, and in particular if they do not match, then remove none:

> d = 1:20
> d
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
> d[-which(d > 10)]
 [1]  1  2  3  4  5  6  7  8  9 10
> d[-which(d > 100)]
integer(0)

      

We see here that the final statement did something very unexpected and silently hid the error even without warning.

At first I thought it was an unwanted (but consistent) consequence of the choice, that an empty index selects all elements of the vector

http://stat.ethz.ch/R-manual/R-devel/library/base/html/Extract.html

as commonly used, for example select the first column of the matrix, m, by writing

m[ , 1]

      

However, the behavior seen here is consistent with the interpretation of an empty vector as "no elements" rather than "all elements":

> a = integer(0)

      

the "no items" selection works exactly as expected:

> v[a]
numeric(0)

      

however, removing "no elements" is not done:

> v[-a]
numeric(0)

      

For an empty vector, both there are no items to select, and an inconsistency is required to remove all items.

Obviously, this problem can be worked around by specifying that () returns a non-zero length or uses a boolean expression as described here In R, why does deleting rows or columns at an empty index result in empty data? Or, what is the "correct" way to delete?

but my two questions are:

  • Why is the behavior inconsistent?
  • Why does it silently do the wrong thing with no error or warning?
+3


source to share


1 answer


This doesn't work because which(d > 100)

and -which(d > 100)

is the same object: there is no difference between an empty vector and negating that empty vector.

For example, imagine what you've done:

d = 1:10

indexer = which(d > 100)
negative_indexer = -indexer

      

The two variables would be the same (which is the only consistent behavior - rotating all elements of an empty vector negation leaves it the same since it has no elements).



indexer
#> integer(0)
negative_indexer
#> integer(0)
identical(indexer, negative_indexer)
#> [1] TRUE

      

At this point, you could not expect d[indexer]

and d[negative_indexer]

will give different results. There is also no room to provide an error or warning: it doesn't know when you passed in an empty vector, that you "meant" a negative version of that empty vector.


The solution is that there is no reason you need to which()

at all for the subset : you can use d[d > 10]

instead of your original example. Therefore, you can use !(d > 100)

or d <= 100

for your negative indexing. This behaves as you would expect, because d > 10

or !(d > 100)

are boolean vectors, not index vectors.

+4


source







All Articles