Calculating distance between points in different data frames

I'm trying to find the distance between points in two different dataframes, given that they have the same value in one of their columns.

I believe the first step is to combine or link data in two data frames. For example, there is dataframe A and B that both have lat / long information in them and they share a column Name

. Note that for a given Name, the lat / long information is different in each data frame. This is why I want to calculate the distance between them.

I am assuming that the final function will be something like an if A$Name=B$Name

, and then use their respective lat / long data to calculate the distance between them.

Any thoughts?

Sample data:

A <- data.frame(Lat=1:4,Long=1:4,Name=c("a","b","c","d"))
B <- data.frame(Lat=5:8,Long=5:8,Name=c("a","b","c","d"))

      

Now I want to link A

and B

so that I can ask the final question if A$Name==B$Name

, what is the distance between them using their respective long long data.

I should also point out that I cannot make a simple Euclidean distance because the points in the water and the distance between them must be in the water (or limited to some area). Any help with this would be appreciated.

+3


source to share


2 answers


To calculate the distance between lat / long points, you can use a function distm

from the package geosphere

. As part of this function, you can use several formulas to calculate the distance: distCosine

, distHaversine

, distVincentySphere

and distVincentyEllipsoid

. The latter is considered the most accurate (according to the author of the package).

library(geosphere)

A <- data.frame(Lat=1:4, Long=1:4, Name=c("a","b","c","d"))
B <- data.frame(Lat=5:8, Long=5:8, Name=c("a","b","c","d"))

A$distance <- distVincentyEllipsoid(A[,c('Long','Lat')], B[,c('Long','Lat')])

      

this gives:

> A
  Lat Long Name distance
1   1    1    a 627129.5
2   2    2    b 626801.7
3   3    3    c 626380.6
4   4    4    d 625866.6

      

Note that you must include lat / long columns in order of first longitude and then latitude.


While this works great for this simple example, on large datasets where the names are not in the same order, this will lead to problems. In this case, you can use data.table

and set keys so that you can match points and calculate the distance (as @MichaelChirico did in his answer):

library(data.table)
A <- data.table(Lat=1:4, Long=1:4, Name=c("a","b","c","d"), key="Name")
B <- data.table(Lat=8:5, Long=8:5, Name=c("d","c","b","a"), key="Name")

A[B,distance:=distVincentyEllipsoid(A[,.(Long,Lat)], B[,.(Long,Lat)])]

      



as you can see, this gives the correct result (i.e. the same) as in the previous method:

> A
   Lat Long Name distance
1:   1    1    a 627129.5
2:   2    2    b 626801.7
3:   3    3    c 626380.6
4:   4    4    d 625866.6

      


To see what it does key="Name"

, compare the following two data types:

B1 <- data.table(Lat=8:5, Long=8:5, Name=c("d","c","b","a"), key="Name")
B2 <- data.table(Lat=8:5, Long=8:5, Name=c("d","c","b","a"))

      


See also this answer for a more detailed example.

+3


source


Without a reproducible example, all I can do is offer you a general solution.

I like data.table

it and the syntax will look very simple here. Check out Getting Started for more information on the package.

I'm going to create two data.table

that match your general description first:

library(data.table)
set.seed(1734)
A<-data.table(Name=1:10,x=rnorm(10),key="Name")
B<-data.table(Name=1:10,y=rnorm(10),key="Name")

      

Now we want to merge A

, and B

using Name

(for the merger, we need a set key

that I've done already), and then use the appropriate x

and y

for calculating the (Euclidean) distance. It's easy to do this:

A[B,distance:=sqrt(x^2+y^2)]

      



Now you are looking for the stored distance in data.table

A

under the column distance

. If you do not want to keep a distance and just want out, you can do: A[B,sqrt(x^2+y^2)]

.

To start from scratch, if A

u B

have already been saved as data.frame

s, it's not much more complicated:

setDT(A,key="Name")[setDT(B,key="Name"),distance:=sqrt(x^2+y^2)]

      

We used a convenient feature setDT

to convert A

and B

(in-line) in data.table

the link at the same time declaring key

how Name

for both. *

* It is not necessary to install the key B

, but I think it is a good practice. Also, the option key

setDT

is only available in the development version data.table

( 1.9.5+

); with CRAN version use setkey(setDT(A),Name)

etc.

+2


source







All Articles