Finding pairs of divisors

Question

Finding pairs of divisors

I'm trying to solve this exercise http://main.edu.pl/en/archive/amppz/2014/dzi and I don't know how to improve the performance of my code. Problems arise when the program has to process more than 500,000 unique numbers (up to 2,000,000, as in the description). Then it took 1-8 seconds to iterate over all these numbers. The tests I have used refer to http://main.edu.pl/en/user.phtml?op=tests&c=52014&task=1263 and I am testing it with the command
program.exe < data.in > result.out

Description: You are given a sequence of _n integer a₁, a₂, ... a_n. You should determine the number of such ordered pairs(_i, _j), that _i, _j equeals(1, ..., _n), _i != _j and a_i is divisor of a_j. The first line of input contains one integer _n(1 <= _n <= 2000000) The second line contains a sequence of _n integers a₁, a₂, ..., a_n(1 <= a_i <= 2000000). In the first and only line of output should contain one integer, denoting the number of pairs sought. For the input data: 5 2 4 5 2 6 the correct answer is: 6 Explanation: There are 6 pars: (1, 2) = 4/2, (1, 4) = 2/2, (1, 5) = 6/2, (4, 1) = 2/2, (4, 2) = 4/2, (4, 5) = 6/2.

For example:
- with a total of 2M and unique numbers of 635 thousand, there are 345 million iterations
in total - with 2M in total and 2 million numbers without a number, a total of 1885 million iterations

#include <iostream>
#include <math.h>
#include <algorithm>

#include <time.h>


#define COUNT_SAME(count) (count - 1) * count


int main(int argc, char **argv) {
    std::ios_base::sync_with_stdio(0);

    int n; // Total numbers
    scanf("%d", &n);

    clock_t start, finish;
    double  duration;

    int minVal = 2000000;
    long long *countVect = new long long[2000001]; // 1-2,000,000; Here I'm counting duplicates

    unsigned long long counter = 0;
    unsigned long long operations = 0;

    int tmp;
    int duplicates = 0;

    for (int i = 0; i < n; i++) {
        scanf("%d", &tmp);

        if (countVect[tmp] > 0) { // Not best way, but works
            ++countVect[tmp];
            ++duplicates;
        } else {
            if (minVal > tmp)
                minVal = tmp;

            countVect[tmp] = 1;
        }
    }

    start = clock();

    int valueJ;
    int sqrtValue, valueIJ;
    int j;

    for (int i = 2000000; i > 0; --i) {
        if (countVect[i] > 0) { // Not all fields are setted up
            if (countVect[i] > 1) 
                counter += COUNT_SAME(countVect[i]); // Sum same values

            sqrtValue = sqrt(i);

            for (j = minVal; j <= sqrtValue; ++j) {
                if (i % j == 0) {
                    valueIJ = i / j;

                    if (valueIJ != i && countVect[valueIJ] > 0 && valueIJ > sqrtValue)
                        counter += countVect[i] * countVect[valueIJ];

                    if (i != j && countVect[j] > 0)
                        counter += countVect[i] * countVect[j];
                }

                ++operations;
            }
        }
    }

    finish = clock();
    duration = (double)(finish - start) / CLOCKS_PER_SEC;
    printf("Loops time: %2.3f", duration);
    std::cout << "s\n";
    std::cout << "\n\nCounter: " << counter << "\n";
    std::cout << "Total operations: " << operations;

    std::cout << "\nDuplicates: " << duplicates << "/" << n;
    return 0;
}

I know I shouldn't be sorting the array at the beginning, but I have no idea how to do it better.

Any advice would be great, thanks!

Here's an improved algorithm - 2M unique numbers within 0.5s. Thanks @PJTraill!

#include <iostream>
#include <math.h>
#include <algorithm>

#include <time.h>


#define COUNT_SAME(count) (count - 1) * count


int main(int argc, char **argv) {
    std::ios_base::sync_with_stdio(0);

    int n; // Total numbers
    scanf("%d", &n);

    clock_t start, finish;
    double  duration;

    int maxVal = 0;
    long long *countVect = new long long[2000001]; // 1-2,000,000; Here I'm counting duplicates

    unsigned long long counter = 0;
    unsigned long long operations = 0;

    int tmp;
    int duplicates = 0;

    for (int i = 0; i < n; i++) {
        scanf("%d", &tmp);

        if (countVect[tmp] > 0) { // Not best way, but works
            ++countVect[tmp];
            ++duplicates;
        } else {
            if (maxVal < tmp)
                maxVal = tmp;

            countVect[tmp] = 1;
        }
    }

    start = clock();

    int j;
    int jCounter = 1;

    for (int i = 0; i <= maxVal; ++i) {
        if (countVect[i] > 0) { // Not all fields are setted up
            if (countVect[i] > 1)
                counter += COUNT_SAME(countVect[i]); // Sum same values

            j = i * ++jCounter;

            while (j <= maxVal) {
                if (countVect[j] > 0)
                    counter += countVect[i] * countVect[j];

                j = i * ++jCounter;
                ++operations;
            }

            jCounter = 1;
        }
    }

    finish = clock();
    duration = (double)(finish - start) / CLOCKS_PER_SEC;
    printf("Loops time: %2.3f", duration);
    std::cout << "s\n";
    std::cout << "\n\nCounter: " << counter << "\n";
    std::cout << "Total operations: " << operations;

    std::cout << "\nDuplicates: " << duplicates << "/" << n;
    return 0;
}

+3

c ++ performance algorithm

Synchro May 29 '15 at 21:32

source to share

2 answers

Ok, I'm not going to write your whole algorithm for you, but it can definitely be done faster. So, I guess this is what you need to go:

So your list is sorted, so you can make a lot of assumptions from this. Take the highest value for example. It won't have multiples. The highest value that does will be the highest value divided by two.

There is also another very useful fact here. Multiples are also multiples. (Still following?;)). Take, for example, the list [2 4 12]. Now you have found (4.12) as a pair. If you now find (2,4) as well, you can infer that 12 is also a multiple of 2. And since you only need to count the pairs, you can just count the number for each number, how many multiples it has, and add that when you see this number is like a few. This means that it is best to iterate backward through the sorted list and search for divisors instead.

And maybe save it in some way [ (three 2 ), (two 5's), ...]

that is. how often the number occurs. Once again, you don't need to keep track of its id as you only need to give them the total number of pairs. Keeping the list this way will help you because all 2 will have the same number of multiples. So, calculate once and then multiply.

0

David van rijn May 29 '15 at 22:23

source to share

PJTraill · Accepted Answer · 2015-05-30T10:59:24+0000

I expect the following to be much faster than the OPs algorithm (optimization oblique):

(Values and frequencies type must be unsigned 32-bit, considered 64-bit - advance before counting if your language won't.)
Read the number of values, N.
Read each v value by adding it to the freq [v] frequency (no need to store it).
- (freq [MAX] (or MAX + 1) can be statically allocated for possible optimal initialization for all 0s)
Calculate the number of pairs including 1 of freq [1] and the number of values.
For each i from 2..MAX (with frequency [i]> 0):
- Calculate the number of pairs (i, i) from freq [i].
- For every multiple of m of i in 2m..MAX:
  - (Use m as a loop counter and increment it, not multiply)
  - Calculate the number of pairs (i, m) from freq [i] and freq [m].
- (if freq [i] = 1, you can omit the calculation of (i, i) and run a version of the loop optimized for freq [i] = 1)
(You can execute the previous (outer) loop from 2..MAX / 2, and then from MAX / 2 + 1..MAX, skipping processing of multiples)

Number of pairs (i, i) = _{freq [i]} C ₂= (freq [i] * (freq [i] - 1)) / 2.
Number of pairs (i, j) = freq [i] * freq [j] for i ≠ j.

This avoids sorting sqrt

and division.

Other optimizations

You can store individual values and scan that array instead (order doesn't matter); gain or loss due to this depends on the density of values in 1..MAX.

If the maximum frequency is <2 ¹⁶ which sounds very likely, all products will be 32 bits. You can take advantage of this by writing numeric functions as a template, keeping track of the maximum frequency, and then selecting the appropriate template instance for the rest. This costs N * (compare + branch) and can be obtained by performing D ² multiplications with 32 bits instead of 64, where D is the number of distinct values. I don't see an easy way to infer that 32 bits is enough for a total other than N <2 ¹⁶ .

If you parallelize this for n processors, you can allow different processors to process different residuals modulo n.

I thought about keeping track of the number of even values to avoid scanning half the frequencies, but I think for most datasets in the given parameters that will do little benefit.

Finding pairs of divisors

Other optimizations

More articles: