How can I filter the coordinates (lat, lon) in the data table?

Question

How can I filter the coordinates (lat, lon) in the data table?

TL; DR

This image of the left outer join displays exactly what I would like: deleting rows data.table

based on two columns lat, lon

that exactly match the lat, lon

columns of another data.table

.

Problem

Suppose I have the following data.table

"dt.master"

with over 1 million lines containing id

and coordinates of a specific location lat, lon

:

id    lat      lon
1     43.23    5.43
2     43.56    4.12
3     52.14   -9.85
4     43.56    4.12
5     43.83    9.43
...   ...      ...

What I would like to do is remove the lines that match a specific pair of coordinates. You might think that a pair of coordinates would be blacklisted (again a data.table

named a "dt.blacklist"

):

lat      lon
43.56    4.12
11.14   -5.85

In this case, when applying the blacklist, the answer should be:

id    lat      lon
1     43.23    5.43
3     52.14   -9.85
5     43.83    9.43
...   ...      ...

Oddly enough, I cannot get it right.

What have I done so far

Using merge

for example:

dt.result <- merge(dt.master, dt.blacklist[, c("lat", "lon")], by.x=c("lat", "lon"), by.y=c("lat", "lon"))

But this gives lines that match and therefore are an inner join. I was thinking about deleting rows based on this result using subset

:

subset(dt.master, lat != dt.result$lat & lon != dt.result$lon)

But the problem is that it partially works, as this example only deletes one line, not two lines as we would like. Somehow he only deletes the first "hit".

Using a quick and dirty solution, concatenating lat, lon

into a new named column "C"

in both data tables and then deleting it as such:
```
dt.master[C != dt.blacklist$C]

      

        
        
        
      

    
```
However, the same problem occurs when only one of the two rows is deleted.

+3

r latitude-longitude data.table coordinates

Victoria June 10. '17 at 9:00

source to share

2 answers

We can use fsetdiff

fromdata.table

fsetdiff(df1[,-1], df2)

or can use anti_join

fromdplyr

library(dplyr)
anti_join(df1, df2)
#  id   lat   lon
#1  1 43.23  5.43
#2  3 52.14 -9.85
#3  5 43.83  9.43

+2

akrun June 10. 17 at 10:30

source to share

h3rm4n · Accepted Answer · 2017-06-10T09:16:07+0000

I think you are looking for this:

dt.master[!dt.blacklist, on = .(lat,lon)]

Output:

   id   lat   lon
1:  1 43.23  5.43
2:  3 52.14 -9.85
3:  5 43.83  9.43

Thanks to the green sage's warning that joining floating points can have unintended side effects. By converting to integers, you can prevent this from happening. As a result, the connection will look a little more complicated:

dt.master[, (2:3) := lapply(.SD,function(x) as.integer(x*100)), .SDcols = 2:3
          ][!dt.blacklist[, (1:2) := lapply(.SD,function(x) as.integer(x*100))], on = .(lat,lon)
            ][, (2:3) := lapply(.SD, `/`, 100), .SDcols = 2:3][]

The conclusion is the same:

   id   lat   lon
1:  1 43.23  5.43
2:  3 52.14 -9.85
3:  5 43.83  9.43

How can I filter the coordinates (lat, lon) in the data table?

TL; DR

Problem

What have I done so far

More articles: