Scalability Location Distance Search Over 100,000 LatLng positions in the USA
Scenario =
1) Delivery offices are distributed across the United States with a maximum delivery radius limit in miles.
2) The geo target address converted to LatLng is the delivery destination.
Target = Return a dataset of delivery offices (1) for which the delivery limit is at this distance to the target address (2)
Attempt =
As a starting point for my problem, I use a great example of a Storm consultant to determine the closest office for a client: Haversine distance between two points
The table "My offices" stores the address of the office together with its Lat and Lng values, as well as their maximum "deliveryLimit" distance.
The SQL for calculating Haversine is blowing my mind and is currently beyond my power!
The stormy SQL is below, but instead of only selecting one row from the direct calculations, I need to select all rows of offices whose maximum delivery limit is less than the distance between office and client.
Question1 = How do I add a maximum distance filter to the SQL query so that it returns offices with a delivery zone that includes the target location?
Question2 = How can I limit the number of offices requested by those that can actually be located in the US target area? For example, if the target location is Boise, Idaho and the Los Angeles offices, California has a delivery distance limit of 300 miles. There is no point in even asking for such offices. However, the offices in Washington; Oregon and Northern Nevada state that Idaho boundaries should be included in the search query, as some may have maximum distance values that reach this Boise, ID example.
SQL for Haversine used by Storm:
SELECT TOP 1 *, ( 3960 * acos( cos( radians( @custLat ) ) *
cos( radians( Lat ) ) * cos( radians( Lng ) - radians( @custLng ) ) +
sin( radians( @custLat ) ) * sin( radians( Lat ) ) ) ) AS Distance
FROM Offices
ORDER BY Distance ASC
The above SQL example only selects the closest office to the target lat / long (@custLng)
The storm approached the distance calculation from two different directions. The SQL above was the first. The second way was to store the coordinates of the office in a list in memory and create a method with a function to cycle through the list, calculating the distances and finally choosing the closest one, for example:
/// <summary>
/// Returns the distance in miles or kilometers of any two
/// latitude / longitude points.
/// </summary>
/// <param name="pos1">Location 1</param>
/// <param name="pos2">Location 2</param>
/// <param name="unit">Miles or Kilometers</param>
/// <returns>Distance in the requested unit</returns>
public double HaversineDistance(LatLng pos1, LatLng pos2, DistanceUnit unit)
{
double R = (unit == DistanceUnit.Miles) ? 3960 : 6371;
var lat = (pos2.Latitude - pos1.Latitude).ToRadians();
var lng = (pos2.Longitude - pos1.Longitude).ToRadians();
var h1 = Math.Sin(lat / 2) * Math.Sin(lat / 2) +
Math.Cos(pos1.Latitude.ToRadians()) *
Math.Cos(pos2.Latitude.ToRadians()) *
Math.Sin(lng / 2) * Math.Sin(lng / 2);
var h2 = 2 * Math.Asin(Math.Min(1, Math.Sqrt(h1)));
return R * h2;
}
public enum DistanceUnit { Miles, Kilometers };
and
var Offices = GetMyOfficeList();
for(int i = 0; i< Offices.Count; i++)
{
Offices[i].Distance = HaversineDistance(
coord,
new LatLng(Offices[i].Lat, Offices[i].Lng),
DistanceUnit.Miles);
}
var closestOffice = Offices.OrderBy(x => x.Distance).Take(1).Single();
Scalability is important as my script can easily get much more than 100,000 offices, so listing office numbers in a layout is unlikely!
source to share
If you are using Sql2008 or newer, it has special types built in to make what you want to do easier. The main type you need to use isgeography
I'm going to make some guesswork on your table structure, but the main thing is that you have Location
andDeleveryArea
create table Offices
(
OfficeName varchar(40),
Location geography,
DeliveryDistance float, --stored in miles
--If you are on SQL2008 or 2008R2 replace BufferWithCurves with one of the older Buffer functions
DeliveryArea as Location.BufferWithCurves(DeliveryDistance * 1609.34) PERSISTED, --1609.34 converts miles to meters
)
I used BufferWithCurves in my example above, but this is only available on Sql2012 and newer, if you are on 2008 or 2008R2 you will need to use BufferWithTolerance or STBuffer, or just manually define your own scope manually in the insert status.
Now, to populate the data, because we made a DeliveryArea
computed persistent column, it's actually pretty easy to do. All you have to do is set the location of the office and the radius of its delivery area and it will calculate the circle for you. I will use the examples given in your example:
insert into Offices (OfficeName, Location, DeliveryDistance)
values ('Boise, ID',
geography::Point(43.6187102,-116.2146068, 4326), --4326 represents a "lat and long" coordinate system
300
)
insert into Offices (OfficeName, Location,DeliveryDistance)
values ('LA, CA',
geography::Point(34.0204989,-118.4117325, 4326),
300
)
insert into Offices (OfficeName, Location,DeliveryDistance)
values ('Walla Walla, WI',
geography::Point(46.0627549,-118.3337259, 4326),
300
)
insert into Offices (OfficeName, Location,DeliveryDistance)
values ('Baker City, OR',
geography::Point(44.7746169,-117.8317284, 4326),
300
)
insert into Offices (OfficeName, Location,DeliveryDistance)
values ('Elko, NV',
geography::Point(40.846931,-115.7669825, 4326),
300
)
Now for your request, if you want to find which service zones are served by Jordan Vally, Oregon (42.9740245, -117.0533247), your request will simply be
declare @cust geography = geography::Point(42.9740245,-117.0533247, 4326)
SELECT OfficeName,
Location.STDistance(@cust) / 1609.34 AS Distance, --convert back to miles
Location.Lat as Lat,
Location.Long as Lng
FROM Offices
where DeliveryArea.STContains(@cust) = 1
ORDER BY Distance asc
And this will give you all the offices you have chosen within the delivery area. The real nice thing about this system is that instead of calculating DeleveryArea
based on location and distance range, you could actually give it a set of points to delineate a non-circular geographic area like a city.
With a well-planned Spatial Index, this solution should work even for your 100,000 location records. If you would like to know more about the benefits of using geography
see this question and the answer to this question .
Here is a SQL script with all the queries above shown in a working example.
source to share
You already have the distance between the office and the target location. Return only those offices whose distance is less than or equal to their delivery radius:
SELECT * FROM (SELECT Id, DeliveryRadius, ( 3960 * acos( cos( radians( @custLat ) ) *
cos( radians( Lat ) ) * cos( radians( Lng ) - radians( @custLng ) ) +
sin( radians( @custLat ) ) * sin( radians( Lat ) ) ) ) AS Distance
FROM Offices)
WHERE Distance <= DeliveryRadius
ORDER BY Distance ASC
Your second question asks how you can separate offices so that the search looks at a subset of offices; I must first make sure that your search is underperforming given your dataset and latency requirements before considering this.
source to share
For @Raduis in miles:
DECLARE @RadiusInDegrees decimal(18,15)
SET @RadiusInDegrees = @Radius / 69.11 * 1.4142
@ Radius / 69.11 gives the radius in degrees, but it needs to be multiplied by the root of 2 to cover the entire square bounding area.
SELECT
@LatitudeLow = @Latitude - @RadiusInDegrees,
@LatitudeHigh = @Latitude + @RadiusInDegrees,
@LongitudeLow = @Longitude - @RadiusInDegrees,
@LongitudeHigh = @Longitude + @RadiusInDegrees
The use of a bounding box limits the number of times expensive precision distance calculations are made.
[...]
WHERE
[Latitude] > @LatitudeLow
AND [Latitude] < @LatitudeHigh
AND [Longitude] > @LongitudeLow
AND [Longitude] < @LongitudeHigh
AND [your exact radius calculation] <=
(CASE WHEN [DeliverlyLimit] < @Radius THEN
[DeliveryLimit] ELSE @Radius END)
Also, for better performance, don't use SELECT * or subqueries if you can avoid it. Return only ids and create a coverage index for ids, latitude, longitude and shipping restrictions. Cache all data from your table into an array of objects, where the index is an id, so the match against the result set is fast.
source to share