More efficient way to count the number of values ​​in an interval?

I want to determine how many numbers of the input array (up to 50,000) lie in each of my given intervals (many).

I am currently trying to do it with this algorithm, but it is too slow:

Array example: {-3, 10, 5, 4, -999999, 999999, 6000}
Interval example: [0, 11] (inclusive)

  • Sorting an array - O(n * log(n))

    . (-999999, -3, 4, 5, 10, 6000, 999999).
  • Find min_index: array[min_index] >= 0

    - O(n)

    . (for my example, min_index == 2).
  • Find max_index: array[max_index] <= 11

    - O(n)

    . (for my example, max_index == 4).
  • If both indexes exist, then Result == right_index - left_index + 1

    (for my example, Result = (4 - 2 + 1) = 3).
+3


source to share


4 answers


You have a good idea, but it needs to be tweaked. You have to find the beginning and end of an interval in O(lg n)

using binary search . If n is the length of the array and q is the number of questions [a, b]

, you have O (n + q * n) time, in binary search this is O((n + q) lg n)

( n lg n

from the sort array). The advantage of this solution is simplicity because C ++ has std :: lower_bound and std :: upper_bound . And you can use std :: distance . These are just a few lines of code.



If q is n, this algorithm has complexity O(n lg n)

. It could be better? It's my pleasure. What for? Because the problem is equivalent to sorting. As you know, it is impossible to get the best computational complexity. (Sort by comparison.)

+2


source


There is a simple O (n input * m interval):

To simplify implementation, we use half-open spans. Convert them as needed.



  • Convert intervals to half-open intervals (always prefer half-open intervals)
  • Store all constraints in the array.
  • For all input elements
    • For all elements in the array limit
      • Increase counter if input is less than limit
  • Go through your intervals and get the answers by subtracting the counts for the respective limits.

For a slight performance improvement, sort the limits array in step 2.

+1


source


Create a std :: map of your numbers to your index in the sorted array.

From your example map [-999999] = 0, map [-3] = 1, ... map [999999] = 7.

To find the interval, find the smallest number greater than or equal to min (using map.lower_bound ()) and find the first number above max (using map.upper_bound ()).

Now you can subtract the subscript from the superscript to find the number of elements in that range in O (log n).

+1


source


typedef std::pair<int,int> interval;
typedef std::map<interval,size_t> answers;
typedef std::vector<interval> questions;

// O((m+n)lg m)
answers solve( std::vector<int>& data, questions const& qs ){
  // m = qs.size()
  // n = data.size()

  answers retval;
  std::vector<std::pair<int, size_t>> edges;
  edges.reserve( q.size()+1 );
  // O(m) -- all start and ends of intervals is in edges
  for ( auto q:qs ) {
    edges.emplace_back( q.first, 0 );
    edges.emplace_back( q.second, 0 );
  }
  // O(mlgm) -- sort
  std::sort(begin(edges),end(edges));
  edges.emplace_back( std::numeric_limits<int>::max(), 0 );
  // O(m) -- remove duplicates
  edges.erase(std::unique(begin(edges),end(edges)),end(edges));
  // O(n lg m) -- count the number of elements < a given edge:
  for(int x:data ){
    auto it = std::lower_bound( begin(edges), end(edges), std::make_pair(x,0) );
    it->second++;
  }
  // O(m)
  size_t accum = 0;
  for(auto& e:edges) {
    accum += edges.second;
    edges.second = accum;
  }
  // now edge (x,y) states that there are y elements < x.

  // O(n lg m) -- find the edge corresponding
  for(auto q:questions){
    auto low = std::lower_bound(begin(edges), end(edges),
      std::make_pair(q.first, size_t(0))
    );
    auto high = std::upper_bound(begin(edges), end(edges),
      std::make_pair(q.second, size_t(0))
    }
    size_t total = high->second - low->second;
    answers.emplace(q,total);
  }
  return answers;
}

      

O((n+m)lg m)

, where n is an integer number, m is the number of intervals, and x is the average number of intervals, each interval of which overlaps with.

+1


source







All Articles