More efficient way to count the number of values in an interval?

Question

More efficient way to count the number of values in an interval?

I want to determine how many numbers of the input array (up to 50,000) lie in each of my given intervals (many).

I am currently trying to do it with this algorithm, but it is too slow:

Array example: {-3, 10, 5, 4, -999999, 999999, 6000}
Interval example: [0, 11] (inclusive)

Sorting an array - O(n * log(n))

. (-999999, -3, 4, 5, 10, 6000, 999999).
Find min_index: array[min_index] >= 0

- O(n)

. (for my example, min_index == 2).
Find max_index: array[max_index] <= 11

- O(n)

. (for my example, max_index == 4).
If both indexes exist, then Result == right_index - left_index + 1

(for my example, Result = (4 - 2 + 1) = 3).

+3

c ++ sorting arrays algorithm search

Max 21 oct. '14 at 21:00

source to share

4 answers

There is a simple O (n _input * m _interval):

To simplify implementation, we use half-open spans. Convert them as needed.

Convert intervals to half-open intervals (always prefer half-open intervals)
Store all constraints in the array.
For all input elements
- For all elements in the array limit
  - Increase counter if input is less than limit
Go through your intervals and get the answers by subtracting the counts for the respective limits.

For a slight performance improvement, sort the limits array in step 2.

+1

Deduplicator 21 oct. 14 at 21:09

source to share

Create a std :: map of your numbers to your index in the sorted array.

From your example map [-999999] = 0, map [-3] = 1, ... map [999999] = 7.

To find the interval, find the smallest number greater than or equal to min (using map.lower_bound ()) and find the first number above max (using map.upper_bound ()).

Now you can subtract the subscript from the superscript to find the number of elements in that range in O (log n).

+1

patros 21 oct. 14 at 21:25

source to share

typedef std::pair<int,int> interval;
typedef std::map<interval,size_t> answers;
typedef std::vector<interval> questions;

// O((m+n)lg m)
answers solve( std::vector<int>& data, questions const& qs ){
  // m = qs.size()
  // n = data.size()

  answers retval;
  std::vector<std::pair<int, size_t>> edges;
  edges.reserve( q.size()+1 );
  // O(m) -- all start and ends of intervals is in edges
  for ( auto q:qs ) {
    edges.emplace_back( q.first, 0 );
    edges.emplace_back( q.second, 0 );
  }
  // O(mlgm) -- sort
  std::sort(begin(edges),end(edges));
  edges.emplace_back( std::numeric_limits<int>::max(), 0 );
  // O(m) -- remove duplicates
  edges.erase(std::unique(begin(edges),end(edges)),end(edges));
  // O(n lg m) -- count the number of elements < a given edge:
  for(int x:data ){
    auto it = std::lower_bound( begin(edges), end(edges), std::make_pair(x,0) );
    it->second++;
  }
  // O(m)
  size_t accum = 0;
  for(auto& e:edges) {
    accum += edges.second;
    edges.second = accum;
  }
  // now edge (x,y) states that there are y elements < x.

  // O(n lg m) -- find the edge corresponding
  for(auto q:questions){
    auto low = std::lower_bound(begin(edges), end(edges),
      std::make_pair(q.first, size_t(0))
    );
    auto high = std::upper_bound(begin(edges), end(edges),
      std::make_pair(q.second, size_t(0))
    }
    size_t total = high->second - low->second;
    answers.emplace(q,total);
  }
  return answers;
}

O((n+m)lg m)

, where n is an integer number, m is the number of intervals, and x is the average number of intervals, each interval of which overlaps with.

+1

Yakk 22 oct. 14 at 12:13

source to share

Tacet · Accepted Answer · 2014-10-21T21:10:15+0000

You have a good idea, but it needs to be tweaked. You have to find the beginning and end of an interval in O(lg n)

using binary search . If n is the length of the array and q is the number of questions [a, b]

, you have O (n + q * n) time, in binary search this is O((n + q) lg n)

( n lg n

from the sort array). The advantage of this solution is simplicity because C ++ has std :: lower_bound and std :: upper_bound . And you can use std :: distance . These are just a few lines of code.

If q is n, this algorithm has complexity O(n lg n)

. It could be better? It's my pleasure. What for? Because the problem is equivalent to sorting. As you know, it is impossible to get the best computational complexity. (Sort by comparison.)

More efficient way to count the number of values ​​in an interval?

More articles:

More efficient way to count the number of values in an interval?