More efficient way to count the number of values in an interval?
I want to determine how many numbers of the input array (up to 50,000) lie in each of my given intervals (many).
I am currently trying to do it with this algorithm, but it is too slow:
Array example: {-3, 10, 5, 4, -999999, 999999, 6000}
Interval example: [0, 11] (inclusive)
- Sorting an array -
O(n * log(n))
. (-999999, -3, 4, 5, 10, 6000, 999999). - Find min_index:
array[min_index] >= 0
-O(n)
. (for my example, min_index == 2). - Find max_index:
array[max_index] <= 11
-O(n)
. (for my example, max_index == 4). - If both indexes exist, then
Result == right_index - left_index + 1
(for my example, Result = (4 - 2 + 1) = 3).
source to share
You have a good idea, but it needs to be tweaked. You have to find the beginning and end of an interval in O(lg n)
using binary search . If n is the length of the array and q is the number of questions [a, b]
, you have O (n + q * n) time, in binary search this is O((n + q) lg n)
( n lg n
from the sort array). The advantage of this solution is simplicity because C ++ has std :: lower_bound and std :: upper_bound . And you can use std :: distance . These are just a few lines of code.
If q is n, this algorithm has complexity O(n lg n)
. It could be better? It's my pleasure. What for? Because the problem is equivalent to sorting. As you know, it is impossible to get the best computational complexity. (Sort by comparison.)
source to share
There is a simple O (n input * m interval):
To simplify implementation, we use half-open spans. Convert them as needed.
- Convert intervals to half-open intervals (always prefer half-open intervals)
- Store all constraints in the array.
- For all input elements
- For all elements in the array limit
- Increase counter if input is less than limit
- For all elements in the array limit
- Go through your intervals and get the answers by subtracting the counts for the respective limits.
For a slight performance improvement, sort the limits array in step 2.
source to share
Create a std :: map of your numbers to your index in the sorted array.
From your example map [-999999] = 0, map [-3] = 1, ... map [999999] = 7.
To find the interval, find the smallest number greater than or equal to min (using map.lower_bound ()) and find the first number above max (using map.upper_bound ()).
Now you can subtract the subscript from the superscript to find the number of elements in that range in O (log n).
source to share
typedef std::pair<int,int> interval;
typedef std::map<interval,size_t> answers;
typedef std::vector<interval> questions;
// O((m+n)lg m)
answers solve( std::vector<int>& data, questions const& qs ){
// m = qs.size()
// n = data.size()
answers retval;
std::vector<std::pair<int, size_t>> edges;
edges.reserve( q.size()+1 );
// O(m) -- all start and ends of intervals is in edges
for ( auto q:qs ) {
edges.emplace_back( q.first, 0 );
edges.emplace_back( q.second, 0 );
}
// O(mlgm) -- sort
std::sort(begin(edges),end(edges));
edges.emplace_back( std::numeric_limits<int>::max(), 0 );
// O(m) -- remove duplicates
edges.erase(std::unique(begin(edges),end(edges)),end(edges));
// O(n lg m) -- count the number of elements < a given edge:
for(int x:data ){
auto it = std::lower_bound( begin(edges), end(edges), std::make_pair(x,0) );
it->second++;
}
// O(m)
size_t accum = 0;
for(auto& e:edges) {
accum += edges.second;
edges.second = accum;
}
// now edge (x,y) states that there are y elements < x.
// O(n lg m) -- find the edge corresponding
for(auto q:questions){
auto low = std::lower_bound(begin(edges), end(edges),
std::make_pair(q.first, size_t(0))
);
auto high = std::upper_bound(begin(edges), end(edges),
std::make_pair(q.second, size_t(0))
}
size_t total = high->second - low->second;
answers.emplace(q,total);
}
return answers;
}
O((n+m)lg m)
, where n is an integer number, m is the number of intervals, and x is the average number of intervals, each interval of which overlaps with.
source to share