Distance from zip code R
I am using the Zip Code package in R and I would like to list all the zip codes that are within 10, 20, or X miles of each zip code. From there, I download zip data up to 10, 20, or X miles. Currently I am joining every zip with every zip (so the number of squares of rows). Then calculating the distance between each zip code. And then eliminate distances greater than 10.20, X miles. Is there a better way to do this in R so I don't have to compute all possibilities? I'm new to R. Thanks!
Code is here:
#Bringing in Zipcode database.
library(zipcode)
data(zipcode)
#Limiting to certain states that I want to include,
SEZips <- zipcode[zipcode$state %in% c("GA","AL", "SC", "NC"),]
#Duplicating the data set to join it together
SEZips2 <- SEZips
#To code in SQL
library(sqldf)
#Creating a common match so I can join all rows from both tables together
SEZips$Match <- 1
SEZips2$Match <- 1
#attaches every zip code to each zip
ZipList <- sqldf("
SELECT
A.zip as zip1,
A.longitude as lon1,
A.latitude as lat1,
B.zip as zip2,
B.longitude as lon2,
B.latitude as lat2
From SEZips A
Left Join SEZips2 B
on A.Match = B.Match
")
#to get the distance calculation, use package geosphere,
library(geosphere)
#radius of Earth in miles, adjust for km, etc.
r = 3959
#Creating Table of the coordinates. Makes it easy to calc distance
Points1 <- cbind(ZipList$lon1,ZipList$lat1)
Points2 <- cbind(ZipList$lon2,ZipList$lat2)
distance <- distHaversine(Points1,Points2,r)
#Adding distance back on to the original ZipList
ZipList$Distance <- distance
#To limit to a certain radius.E.g. 15 for 15 miles.
z = 15
#Eliminating matches > z
ZipList2 <- ZipList[ZipList$Distance <= z,]
#Adding data to roll up, e.g. population
ZipPayroll <- read.csv("filepath/ZipPayroll.csv")
#Changin Zip to 5 character from integer. A little bit of pain
#Essentailly code says (add 5 0's, and then grab the right 5 characters)
ZipPayroll$Zip2 <- substr(paste("00000",ZipPayroll$zip,sep=""),nchar(paste("00000",ZipPayroll$zip,sep=""))-4,nchar(paste("00000",ZipPayroll$zip,sep="")))
#Joining Payroll info to SEZips dataframe
SEZips <- sqldf("
SELECT
A.*,
B.Payroll,
B.Employees,
B.Establishments
From SEZips A
Left Join ZipPayroll B
on A.zip = B.Zip2
")
#Rolling up to 15 mile level
SEZips15 <- sqldf("
SELECT
A.zip1 as Zip,
Sum(B.Payroll) as PayrollArea,
Sum(B.Employees) as EmployeesArea,
Sum(B.Establishments) as EstablishmentsArea
From ZipList2 A
Left Join SEZips B
on A.zip2 = B.zip
Group By A.zip1
")
#Include the oringinal Zip data
SEZips15 <- sqldf("
SELECT
A.*,
B.Payroll,
B.Employees,
B.Establishments as EstablishmentsArea
From SEZips15 A
Left Join SEZips B
on A.zip = B.zip
")
#Calculate Average Pay for Zip and Area
SEZips15$AvgPayArea <- SEZips15$PayrollArea / SEZips15$EmployeesArea
SEZips15$AvgPay <- SEZips15$Payroll / SEZips15$Employees
+3
source to share
No one has answered this question yet
Check out similar questions: