Create rating for vector double

I have a vector with doubles that I want to rank (it is actually a vector with double member objects called costs

). If there are only unique values, or I ignore ambiguous values, then no problem. However, I want to use the average rank for non-single values. Also, I found some question on SO about ranks, however they ignore non-unique values.

For example, say that we have (1, 5, 4, 5, 5), then the corresponding ranks should be (1, 4, 2, 4, 4). When we ignore unequal values, the ranks are (1, 3, 2, 4, 5).

When ignoring ambiguous values, I used the following:

void Population::create_ranks_costs(vector<Solution> &pop)
{
  size_t const n = pop.size();

  // Create an index vector
  vector<size_t> index(n);
  iota(begin(index), end(index), 0);

  sort(begin(index), end(index), 
       [&pop] (size_t idx, size_t idy) { 
         return pop[idx].costs() < pop[idy].costs();
       });

  // Store the result in the corresponding solutions
  for (size_t idx = 0; idx < n; ++idx)
    pop[index[idx]].set_rank_costs(idx + 1);
}

      

Does anyone know how to account for non-unique values? I prefer to use std::algorithm

as IMO leads to code cleanup.

+3


source to share


3 answers


One way to do this is to use multimap

.

  • Put the elements in a multimap by mapping your objects to size_t

    (intial values ​​don't matter). You can do it with one line (use ctor, which accepts iterators).

  • Loop (either explicitly or using anything from algorithm

    ) and assign the values ​​0, 1, ... as values.

  • Scroll through individual keys. For each individual key, call equal_range

    for the key and set its values ​​to average (again you can use stuff from algorithm

    for this).



The total complexity should be theta (n log (n)), where n is the length of the vector.

+2


source


Here is a subroutine for vectors, as the title of the question suggests:

template<typename Vector>
std::vector<double> rank(const Vector& v)
{
    std::vector<std::size_t> w(v.size());
    std::iota(begin(w), end(w), 0);
    std::sort(begin(w), end(w), 
        [&v](std::size_t i, std::size_t j) { return v[i] < v[j]; });

    std::vector<double> r(w.size());
    for (std::size_t n, i = 0; i < w.size(); i += n)
    {
        n = 1;
        while (i + n < w.size() && v[w[i]] == v[w[i+n]]) ++n;
        for (std::size_t k = 0; k < n; ++k)
        {
            r[w[i+k]] = i + (n + 1) / 2.0; // average rank of n tied values
            // r[w[i+k]] = i + 1;          // min 
            // r[w[i+k]] = i + n;          // max
            // r[w[i+k]] = i + k + 1;      // random order
        }
    }
    return r;
}

      

See IDEone for a working example .



For ranks with related (equal) values, there are various conventions (min, max, average rank, or random order). Pick one of them in the innermost loop (average rank is common in statistics, minimum rank in sports).

Please note that the averaged ranks can be non-integer ( n+0.5

). I don't know if rounding to the integral rank n

is a problem for your application.

The algorithm can easily be generalized to custom orders such as pop[i].costs()

with std::less<>

default.

+2


source


Something like this:

size_t run_start = 0;
double run_cost = pop[index[0]].costs();
for (size_t idx = 1; idx <= n; ++idx) {
  double new_cost = idx < n ? pop[index[idx]].costs() : 0;
  if (idx == n || new_cost != run_cost) {
    double avg_rank = (run_start + 1 + idx) / 2.0;
    for (size_t j = run_start; j < idx; ++j) {
       pop[index[j]].set_rank_costs(avg_rank);
    }

    run_start = idx;
    run_cost = new_cost;
  }
}

      

Basically, you are iterating over the sorted sequence and identifying runs of equal values ​​(possibly of length of length 1). For each such run, you calculate its average rank and set it for all elements in the run.

0


source







All Articles