Create rating for vector double
I have a vector with doubles that I want to rank (it is actually a vector with double member objects called costs
). If there are only unique values, or I ignore ambiguous values, then no problem. However, I want to use the average rank for non-single values. Also, I found some question on SO about ranks, however they ignore non-unique values.
For example, say that we have (1, 5, 4, 5, 5), then the corresponding ranks should be (1, 4, 2, 4, 4). When we ignore unequal values, the ranks are (1, 3, 2, 4, 5).
When ignoring ambiguous values, I used the following:
void Population::create_ranks_costs(vector<Solution> &pop)
{
size_t const n = pop.size();
// Create an index vector
vector<size_t> index(n);
iota(begin(index), end(index), 0);
sort(begin(index), end(index),
[&pop] (size_t idx, size_t idy) {
return pop[idx].costs() < pop[idy].costs();
});
// Store the result in the corresponding solutions
for (size_t idx = 0; idx < n; ++idx)
pop[index[idx]].set_rank_costs(idx + 1);
}
Does anyone know how to account for non-unique values? I prefer to use std::algorithm
as IMO leads to code cleanup.
source to share
One way to do this is to use multimap
.
-
Put the elements in a multimap by mapping your objects to
size_t
(intial values ββdon't matter). You can do it with one line (use ctor, which accepts iterators). -
Loop (either explicitly or using anything from
algorithm
) and assign the values ββ0, 1, ... as values. -
Scroll through individual keys. For each individual key, call
equal_range
for the key and set its values ββto average (again you can use stuff fromalgorithm
for this).
The total complexity should be theta (n log (n)), where n is the length of the vector.
source to share
Here is a subroutine for vectors, as the title of the question suggests:
template<typename Vector>
std::vector<double> rank(const Vector& v)
{
std::vector<std::size_t> w(v.size());
std::iota(begin(w), end(w), 0);
std::sort(begin(w), end(w),
[&v](std::size_t i, std::size_t j) { return v[i] < v[j]; });
std::vector<double> r(w.size());
for (std::size_t n, i = 0; i < w.size(); i += n)
{
n = 1;
while (i + n < w.size() && v[w[i]] == v[w[i+n]]) ++n;
for (std::size_t k = 0; k < n; ++k)
{
r[w[i+k]] = i + (n + 1) / 2.0; // average rank of n tied values
// r[w[i+k]] = i + 1; // min
// r[w[i+k]] = i + n; // max
// r[w[i+k]] = i + k + 1; // random order
}
}
return r;
}
See IDEone for a working example .
For ranks with related (equal) values, there are various conventions (min, max, average rank, or random order). Pick one of them in the innermost loop (average rank is common in statistics, minimum rank in sports).
Please note that the averaged ranks can be non-integer ( n+0.5
). I don't know if rounding to the integral rank n
is a problem for your application.
The algorithm can easily be generalized to custom orders such as pop[i].costs()
with std::less<>
default.
source to share
Something like this:
size_t run_start = 0;
double run_cost = pop[index[0]].costs();
for (size_t idx = 1; idx <= n; ++idx) {
double new_cost = idx < n ? pop[index[idx]].costs() : 0;
if (idx == n || new_cost != run_cost) {
double avg_rank = (run_start + 1 + idx) / 2.0;
for (size_t j = run_start; j < idx; ++j) {
pop[index[j]].set_rank_costs(avg_rank);
}
run_start = idx;
run_cost = new_cost;
}
}
Basically, you are iterating over the sorted sequence and identifying runs of equal values ββ(possibly of length of length 1). For each such run, you calculate its average rank and set it for all elements in the run.
source to share