Scalability Location Distance Search Over 100,000 LatLng positions in the USA

Scenario =

1) Delivery offices are distributed across the United States with a maximum delivery radius limit in miles.

2) The geo target address converted to LatLng is the delivery destination.

Target = Return a dataset of delivery offices (1) for which the delivery limit is at this distance to the target address (2)

Attempt =

As a starting point for my problem, I use a great example of a Storm consultant to determine the closest office for a client: Haversine distance between two points

The table "My offices" stores the address of the office together with its Lat and Lng values, as well as their maximum "deliveryLimit" distance.

The SQL for calculating Haversine is blowing my mind and is currently beyond my power!

The stormy SQL is below, but instead of only selecting one row from the direct calculations, I need to select all rows of offices whose maximum delivery limit is less than the distance between office and client.

Question1 = How do I add a maximum distance filter to the SQL query so that it returns offices with a delivery zone that includes the target location?

Question2 = How can I limit the number of offices requested by those that can actually be located in the US target area? For example, if the target location is Boise, Idaho and the Los Angeles offices, California has a delivery distance limit of 300 miles. There is no point in even asking for such offices. However, the offices in Washington; Oregon and Northern Nevada state that Idaho boundaries should be included in the search query, as some may have maximum distance values ​​that reach this Boise, ID example.

SQL for Haversine used by Storm:

SELECT TOP 1 *, ( 3960 * acos( cos( radians( @custLat ) ) *
cos( radians( Lat ) ) * cos( radians(  Lng  ) - radians( @custLng ) ) +
sin( radians( @custLat ) ) * sin( radians(  Lat  ) ) ) ) AS Distance
FROM Offices
ORDER BY Distance ASC

      

The above SQL example only selects the closest office to the target lat / long (@custLng)

The storm approached the distance calculation from two different directions. The SQL above was the first. The second way was to store the coordinates of the office in a list in memory and create a method with a function to cycle through the list, calculating the distances and finally choosing the closest one, for example:

/// <summary>
/// Returns the distance in miles or kilometers of any two
/// latitude / longitude points.
/// </summary>
/// <param name="pos1">Location 1</param>
/// <param name="pos2">Location 2</param>
/// <param name="unit">Miles or Kilometers</param>
/// <returns>Distance in the requested unit</returns>
public double HaversineDistance(LatLng pos1, LatLng pos2, DistanceUnit unit)
{
    double R = (unit == DistanceUnit.Miles) ? 3960 : 6371;
    var lat = (pos2.Latitude - pos1.Latitude).ToRadians();
    var lng = (pos2.Longitude - pos1.Longitude).ToRadians();
    var h1 = Math.Sin(lat / 2) * Math.Sin(lat / 2) +
              Math.Cos(pos1.Latitude.ToRadians()) * 
              Math.Cos(pos2.Latitude.ToRadians()) *
              Math.Sin(lng / 2) * Math.Sin(lng / 2);
    var h2 = 2 * Math.Asin(Math.Min(1, Math.Sqrt(h1)));
    return R * h2;
}

public enum DistanceUnit { Miles, Kilometers };

      

and

var Offices = GetMyOfficeList();
for(int i = 0; i< Offices.Count; i++)
{
    Offices[i].Distance = HaversineDistance(
                        coord,
                        new LatLng(Offices[i].Lat, Offices[i].Lng),
                        DistanceUnit.Miles);
}

var closestOffice = Offices.OrderBy(x => x.Distance).Take(1).Single();

      

Scalability is important as my script can easily get much more than 100,000 offices, so listing office numbers in a layout is unlikely!

+3


source to share


3 answers


If you are using Sql2008 or newer, it has special types built in to make what you want to do easier. The main type you need to use isgeography

I'm going to make some guesswork on your table structure, but the main thing is that you have Location

andDeleveryArea

create table Offices
(
  OfficeName varchar(40),
  Location geography,
  DeliveryDistance float, --stored in miles
  --If you are on SQL2008 or 2008R2 replace BufferWithCurves with one of the older Buffer functions
  DeliveryArea as Location.BufferWithCurves(DeliveryDistance * 1609.34) PERSISTED, --1609.34 converts miles to meters 
)

      

I used BufferWithCurves in my example above, but this is only available on Sql2012 and newer, if you are on 2008 or 2008R2 you will need to use BufferWithTolerance or STBuffer, or just manually define your own scope manually in the insert status.

Now, to populate the data, because we made a DeliveryArea

computed persistent column, it's actually pretty easy to do. All you have to do is set the location of the office and the radius of its delivery area and it will calculate the circle for you. I will use the examples given in your example:

insert into Offices (OfficeName, Location, DeliveryDistance) 
values ('Boise, ID', 
        geography::Point(43.6187102,-116.2146068, 4326), --4326 represents a "lat and long" coordinate system
        300
       )

insert into Offices (OfficeName, Location,DeliveryDistance) 
values ('LA, CA', 
        geography::Point(34.0204989,-118.4117325, 4326),
        300
       )

insert into Offices (OfficeName, Location,DeliveryDistance) 
values ('Walla Walla, WI', 
        geography::Point(46.0627549,-118.3337259, 4326),
        300
       )

insert into Offices (OfficeName, Location,DeliveryDistance) 
values ('Baker City, OR', 
        geography::Point(44.7746169,-117.8317284, 4326),
        300
       )

insert into Offices (OfficeName, Location,DeliveryDistance) 
values ('Elko, NV', 
        geography::Point(40.846931,-115.7669825, 4326),
        300
       )

      



Now for your request, if you want to find which service zones are served by Jordan Vally, Oregon (42.9740245, -117.0533247), your request will simply be

declare @cust geography = geography::Point(42.9740245,-117.0533247, 4326)

SELECT OfficeName, 
       Location.STDistance(@cust) / 1609.34 AS Distance, --convert back to miles
       Location.Lat as Lat,
       Location.Long as Lng
FROM Offices
where  DeliveryArea.STContains(@cust) = 1
ORDER BY Distance asc

      

And this will give you all the offices you have chosen within the delivery area. The real nice thing about this system is that instead of calculating DeleveryArea

based on location and distance range, you could actually give it a set of points to delineate a non-circular geographic area like a city.

With a well-planned Spatial Index, this solution should work even for your 100,000 location records. If you would like to know more about the benefits of using geography

see this question and the answer to this question .

Here is a SQL script with all the queries above shown in a working example.

+3


source


You already have the distance between the office and the target location. Return only those offices whose distance is less than or equal to their delivery radius:

SELECT * FROM (SELECT Id, DeliveryRadius, ( 3960 * acos( cos( radians( @custLat ) ) *
    cos( radians( Lat ) ) * cos( radians(  Lng  ) - radians( @custLng ) ) +
    sin( radians( @custLat ) ) * sin( radians(  Lat  ) ) ) ) AS Distance
    FROM Offices)
WHERE Distance <= DeliveryRadius
ORDER BY Distance ASC

      



Your second question asks how you can separate offices so that the search looks at a subset of offices; I must first make sure that your search is underperforming given your dataset and latency requirements before considering this.

+1


source


For @Raduis in miles:

DECLARE @RadiusInDegrees decimal(18,15)
SET @RadiusInDegrees = @Radius / 69.11 * 1.4142

      

@ Radius / 69.11 gives the radius in degrees, but it needs to be multiplied by the root of 2 to cover the entire square bounding area.

SELECT
    @LatitudeLow = @Latitude - @RadiusInDegrees,
    @LatitudeHigh = @Latitude + @RadiusInDegrees,       
    @LongitudeLow = @Longitude - @RadiusInDegrees,
    @LongitudeHigh = @Longitude + @RadiusInDegrees

      

The use of a bounding box limits the number of times expensive precision distance calculations are made.

[...]
WHERE
    [Latitude] > @LatitudeLow  
    AND [Latitude] < @LatitudeHigh 
    AND [Longitude] > @LongitudeLow
    AND [Longitude] < @LongitudeHigh                
    AND [your exact radius calculation] <= 
      (CASE WHEN [DeliverlyLimit] < @Radius THEN 
          [DeliveryLimit] ELSE @Radius END)

      

Also, for better performance, don't use SELECT * or subqueries if you can avoid it. Return only ids and create a coverage index for ids, latitude, longitude and shipping restrictions. Cache all data from your table into an array of objects, where the index is an id, so the match against the result set is fast.

+1


source







All Articles