Data structure for storing IP ranges to quickly search for a given IP (Java)

I am trying to present an efficient data structure for representing IP ranges as described below. I know what I want to try, pretty easy, I just can't seem to show up on it.

So, let's say I have hundreds of separate IP ranges in the format 1.1.1.0 - 1.1.2.255 or whatever (but not CIDR format like 1.1.1.0/24).

The different ranges are not sequential, so there could be millions of IP addresses between the end of one and the start of the next. They could / would have been represented in integer format instead if preferred (i.e. 16843008 - 16843519 in this example).

There would be no overlap of IP addresses in other ranges.

Basically, these ranges are ASN network blocks, if you're interested. And I need to make a tool to determine if any given IP address fits in one of these ranges, but the tool should be pretty fast (less than 0.5 seconds, ideally).

Now if I have hundreds or thousands of these ranges that cover millions of IPs and you want to know if a given IP is in one of the ranges (or not), which would be the fastest way and also not too memory intensive?

There are several options I can think of:

  • Create a HashSet that contains every IP from all ranges and just create (ip) with it. I would expect around 50 million IP addresses. Fast, but seems a little wasteful, memory wise?

  • You have a TreeMap whose key is the starting IP address of each range and whose value is the ending IP address. Walk through the tree and check each key if the test IP is greater than this key but less than the next key. If so, then investigate the value (i.e. the end of the IP range), and if the IP is less than the value of the card, then the IP is in the range - if not, then there is no point in continuing and might assume the IP is not in any of the ranges ... Can a binary search through the keys of the tree jump to the output faster rather than checking the order?

  • Another idea is to have HashMap keys that will be all possible subnets in all ranges (I understand there will be a lot of them), for example "123.123.123, 123.123.124, 123.123.125, 211.211.211, 211.211.215" etc. .d. Then if I am asked to check IP 123.123.124.144, I can see if its subnet (123.123.124) is the key on the map first. The map value can be a custom object containing the starting and ending IP addresses of the range associated with this particular subnet. Then you can just use that to check if the full IP address is in the range. This special object will be shared by many entries on the map, since it is obvious that there can be many subnets in a given range.

So, any thoughts / ideas / opinions? I have a feeling for my second idea, what might be a good way to go? Thanks for the info ... really glad to hear your ideas!

+3


source to share


4 answers


If the ranges do not contain sub-ranges, you can check the guava RangeSet.
https://code.google.com/p/guava-libraries/wiki/NewCollectionTypesExplained#RangeSet
In fact, I haven't analyzed the time and space complexity of the RangeSet, but the RangeSet seems to fit your requirements well enough.



+3


source


I am using an AVL tree with an IP range as a node value and a suitable comparison function. (Where the range is a..b (a <= b) when comparing two ranges r1 and r2: r1 <r2 if r1.b <r2.a; r1 "==" r2 if r1.a> = r2.a and r1.b = r2.b; r1> r2 if r1.a> r2.b. So, "==" means r1 is equal to or included in r2.)

If you don't have a match, that's enough. If you have overlaps (like mine, but I handle network prefixes), you end up with AVL trees nested within AVL trees.



When you say that there is no overlap of ASN network blocks, I am assuming that if the ASN has / yy delegated to it, you split parent / xx into separate but contiguous network blocks.

Since your network block list doesn't change that often, you probably don't need an AVL tree. You can just sort the netblocks and get listed with a binary chop. If you need something that is faster than a binary tree / chop, you can have an auxiliary set of pointers to the binary chop using the ms byte of the start of the search range to determine the first and last ranges to look for in.

+1


source


This is the structure I am using. Suppose there is another table in this situation, a table locations

to see the purpose and use of IP ranges in a real situation.

--
-- Table structure for table `locations`
--
CREATE TABLE IF NOT EXISTS `locations` (
  `location_id` int(10) unsigned NOT NULL,
  `parent_id` int(10) unsigned NOT NULL,
  `location_name` varchar(64) NOT NULL,
  PRIMARY KEY (`location_id`),
  KEY `parent_id` (`parent_id`)
);

--
-- Table structure for table `locations_to_ip_ranges`
--
CREATE TABLE IF NOT EXISTS `locations_to_ip_ranges` (
  `l_ip_r` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `location_id` int(10) unsigned NOT NULL,
  `starting_ip` varchar(45) NOT NULL,
  `ending_ip` varchar(45) NOT NULL,
  `starting_cidr` int(10) unsigned NOT NULL,
  `ending_cidr` int(10) unsigned NOT NULL,
  PRIMARY KEY (`l_ip_r`),
  KEY `location_id` (`location_id`)
);

      

Here are some records from the second table

l_ip_r  location_id starting_ip     ending_ip        starting_cidr  ending_cidr
-------------------------------------------------------------------------------
94005   47          217.147.0.0     217.147.15.255   3650289664     3650293759
94004   47          217.146.32.0    217.146.47.255   3650232320     3650236415
94003   47          217.145.144.0   217.145.159.255  3650195456     3650199551
94002   47          217.145.16.0    217.145.31.255   3650162688     3650166783
94001   47          217.144.176.0   217.144.191.255  3650138112     3650142207

      

Below is a useful function to convert an IP address to a CIDR number. It is written in PHP, but I believe it would be easy to convert it to Java. The function used here explode()

splits the string according to the specified delimiter.

function ip_address_to_cidr($ip_address){
    $ips = explode(".", $ip_address);
    return ($ips[3] + $ips[2] * 256 + $ips[1] * 65536 + $ips[0] * 16777216);
}

      

So, if you want to get the country for a given IP, you would have something like this

// call the ip_ip_address_to_cidr function passing the remote_address as an argument
$cidr = ip_address_to_cidr($_SERVER['REMOTE_ADDR']);

// pass the returned $cidr to the following query and get the location_id
$set = mysql_query("
    SELECT location_id 
    FROM locations_to_ip_ranges
    WHERE " . $cidr . " BETWEEN starting_cidr AND ending_cidr
");
$row = mysql_fetch_object($set);
echo $row->location_id 

      

Since I need multi-language support, there is another table locations_to_languages

I left behind to keep it simple and straightforward. These tables currently contain tens of millions of data and I have no performance issues.

Side note: I haven't used Java for a long time, so the snippet above is in PHP, but I find it would not be easy to understand the logic and convert it to Java if needed.

0


source


We can use btrees through which we can map the ip address in primary memory and then map them to secondary memory. Since we know that we need to store a large enough number of ip addresses, we cannot store them in primary memory, since this is the case, it is better if we use btree. Since it is similar to the hashing principle, it is also memory efficient.

0


source







All Articles