Python - unique set of ranges, merge as needed

Is there a data structure that will support a unique set of ranges by combining contiguous or overlapping ranges that are added? I need to keep track of which ranges have been processed, but this can happen in no particular order. For example:.

range_set = RangeSet() # doesn't exist that I know of, this is what I need help with

def process_data(start, end):
    global range_set
    range_set.add_range(start, end)
    # ...

process_data(0, 10)
process_data(20, 30)
process_data(5, 15)
process_data(50, 60)

print(range_set.missing_ranges())
# [[16,19], [31, 49]]

print(range_set.ranges())
# [[0,15], [20,30], [50, 60]]

      

Note that overlapping or adjacent ranges are merged together. What's the best way to do this? I looked at the use of the bisect module, but its use did not strike me as completely clear.

+3


source to share


4 answers


Another approach is based on sympy.sets .



>>> import sympy as sym
>>> a = sym.Interval(1, 2, left_open=False, right_open=False)
>>> b = sym.Interval(3, 4, left_open=False, right_open=False)
>>> domain = sym.Interval(0, 10, left_open=False, right_open=False)
>>> missing = domain - a - b
>>> missing
[0, 1) U (2, 3) U (4, 10]
>>> 2 in missing
False
>>> missing.complement(domain)
[1, 2] U [3, 4]

      

+1


source


You can get some similar functionality with inline data structure set

using pythons; assume that only integer values ​​are valid for start

and end

.

>>> whole_domain = set(range(12))
>>> A = set(range(0,1))
>>> B = set(range(4,9))
>>> C = set(range(3,6))  # processed range(3,5) twice
>>> done = A | B | C
>>> print done
set([0, 3, 4, 5, 6, 7, 8])
>>> missing = whole_domain - done
>>> print missing
set([1, 2, 9, 10, 11])

      

It still lacks many "ranges", but it may be enough.



A simple query, if a specific range has already been processed, might look like this:

>>> isprocessed = [foo in done for foo in set(range(2,6))]
>>> print isprocessed
[False, True, True, True]

      

0


source


I've only tested it lightly, but it looks like you're looking for something like this. You will need to add methods to get the ranges and missing ranges yourself, but this should be very inconvenient since it RangeSet.ranges

is a list of objects Range

stored in sorted order. For a nicer interface, you can write a convenient method that converts it to a list of 2 tuples, for example.

EDIT: I just changed it to use less than or equal comparisons for merging. Note, however, that this will not merge "contiguous" records (for example, it will not merge (1, 5)

and (6, 10)

). To do this, you just need to change the condition in Range.check_merge()

.

import bisect


class Range(object):

    # Reduces memory usage, overkill unless you're using a lot of these.
    __slots__ = ["start", "end"]

    def __init__(self, start, end):
        """Initialise this range."""
        self.start = start
        self.end = end

    def __cmp__(self, other):
        """Sort ranges by their initial item."""
        return cmp(self.start, other.start)

    def check_merge(self, other):
        """Merge in specified range and return True iff it overlaps."""
        if other.start <= self.end and other.end >= self.start:
            self.start = min(other.start, self.start)
            self.end = max(other.end, self.end)
            return True
        return False


class RangeSet(object):

    def __init__(self):
        self.ranges = []

    def add_range(self, start, end):
        """Merge or insert the specified range as appropriate."""
        new_range = Range(start, end)
        offset = bisect.bisect_left(self.ranges, new_range)
        # Check if we can merge backwards.
        if offset > 0 and self.ranges[offset - 1].check_merge(new_range):
            new_range = self.ranges[offset - 1]
            offset -= 1
        else:
            self.ranges.insert(offset, new_range)
        # Scan for forward merges.
        check_offset = offset + 1
        while (check_offset < len(self.ranges) and
                new_range.check_merge(self.ranges[offset+1])):
            check_offset += 1
        # Remove any entries that we've just merged.
        if check_offset - offset > 1:
            self.ranges[offset+1:check_offset] = []

      

0


source


You have applied a good solution in your use case. Instead of trying to maintain a set of bands that were used, keep track of the bands that were not used. This makes the task difficult.

class RangeSet:
    def __init__(self, min, max):
        self.__gaps = [(min, max)]
        self.min = min
        self.max = max

    def add(self, lo, hi):
        new_gaps = []
        for g in self.__gaps:
            for ng in (g[0],min(g[1],lo)),(max(g[0],hi),g[1]):
                if ng[1] > ng[0]: new_gaps.append(ng)
        self.__gaps = new_gaps

    def missing_ranges(self):
        return self.__gaps

    def ranges(self):
        i = iter([self.min] + [x for y in self.__gaps for x in y] + [self.max])
        return [(x,y) for x,y in zip(i,i) if y > x]

      

The magic lies in the method add

that checks each existing whitespace to see if it affects the new range and adjusts the whitespace list accordingly.

Note that the behavior of tuples used for ranges here is the same as Python objects range

, that is, they include value start

and exclude value stop

. This class will not behave exactly as you described in your question, where your ranges seem to contain both.

0


source







All Articles