Python - unique set of ranges, merge as needed
Is there a data structure that will support a unique set of ranges by combining contiguous or overlapping ranges that are added? I need to keep track of which ranges have been processed, but this can happen in no particular order. For example:.
range_set = RangeSet() # doesn't exist that I know of, this is what I need help with
def process_data(start, end):
global range_set
range_set.add_range(start, end)
# ...
process_data(0, 10)
process_data(20, 30)
process_data(5, 15)
process_data(50, 60)
print(range_set.missing_ranges())
# [[16,19], [31, 49]]
print(range_set.ranges())
# [[0,15], [20,30], [50, 60]]
Note that overlapping or adjacent ranges are merged together. What's the best way to do this? I looked at the use of the bisect module, but its use did not strike me as completely clear.
source to share
Another approach is based on sympy.sets .
>>> import sympy as sym
>>> a = sym.Interval(1, 2, left_open=False, right_open=False)
>>> b = sym.Interval(3, 4, left_open=False, right_open=False)
>>> domain = sym.Interval(0, 10, left_open=False, right_open=False)
>>> missing = domain - a - b
>>> missing
[0, 1) U (2, 3) U (4, 10]
>>> 2 in missing
False
>>> missing.complement(domain)
[1, 2] U [3, 4]
source to share
You can get some similar functionality with inline data structure set
using pythons; assume that only integer values ββare valid for start
and end
.
>>> whole_domain = set(range(12))
>>> A = set(range(0,1))
>>> B = set(range(4,9))
>>> C = set(range(3,6)) # processed range(3,5) twice
>>> done = A | B | C
>>> print done
set([0, 3, 4, 5, 6, 7, 8])
>>> missing = whole_domain - done
>>> print missing
set([1, 2, 9, 10, 11])
It still lacks many "ranges", but it may be enough.
A simple query, if a specific range has already been processed, might look like this:
>>> isprocessed = [foo in done for foo in set(range(2,6))]
>>> print isprocessed
[False, True, True, True]
source to share
I've only tested it lightly, but it looks like you're looking for something like this. You will need to add methods to get the ranges and missing ranges yourself, but this should be very inconvenient since it RangeSet.ranges
is a list of objects Range
stored in sorted order. For a nicer interface, you can write a convenient method that converts it to a list of 2 tuples, for example.
EDIT: I just changed it to use less than or equal comparisons for merging. Note, however, that this will not merge "contiguous" records (for example, it will not merge (1, 5)
and (6, 10)
). To do this, you just need to change the condition in Range.check_merge()
.
import bisect
class Range(object):
# Reduces memory usage, overkill unless you're using a lot of these.
__slots__ = ["start", "end"]
def __init__(self, start, end):
"""Initialise this range."""
self.start = start
self.end = end
def __cmp__(self, other):
"""Sort ranges by their initial item."""
return cmp(self.start, other.start)
def check_merge(self, other):
"""Merge in specified range and return True iff it overlaps."""
if other.start <= self.end and other.end >= self.start:
self.start = min(other.start, self.start)
self.end = max(other.end, self.end)
return True
return False
class RangeSet(object):
def __init__(self):
self.ranges = []
def add_range(self, start, end):
"""Merge or insert the specified range as appropriate."""
new_range = Range(start, end)
offset = bisect.bisect_left(self.ranges, new_range)
# Check if we can merge backwards.
if offset > 0 and self.ranges[offset - 1].check_merge(new_range):
new_range = self.ranges[offset - 1]
offset -= 1
else:
self.ranges.insert(offset, new_range)
# Scan for forward merges.
check_offset = offset + 1
while (check_offset < len(self.ranges) and
new_range.check_merge(self.ranges[offset+1])):
check_offset += 1
# Remove any entries that we've just merged.
if check_offset - offset > 1:
self.ranges[offset+1:check_offset] = []
source to share
You have applied a good solution in your use case. Instead of trying to maintain a set of bands that were used, keep track of the bands that were not used. This makes the task difficult.
class RangeSet:
def __init__(self, min, max):
self.__gaps = [(min, max)]
self.min = min
self.max = max
def add(self, lo, hi):
new_gaps = []
for g in self.__gaps:
for ng in (g[0],min(g[1],lo)),(max(g[0],hi),g[1]):
if ng[1] > ng[0]: new_gaps.append(ng)
self.__gaps = new_gaps
def missing_ranges(self):
return self.__gaps
def ranges(self):
i = iter([self.min] + [x for y in self.__gaps for x in y] + [self.max])
return [(x,y) for x,y in zip(i,i) if y > x]
The magic lies in the method add
that checks each existing whitespace to see if it affects the new range and adjusts the whitespace list accordingly.
Note that the behavior of tuples used for ranges here is the same as Python objects range
, that is, they include value start
and exclude value stop
. This class will not behave exactly as you described in your question, where your ranges seem to contain both.
source to share