Optimizing runtime in a simple vector implementation

I have just started implementing my own vector class and I am testing it with a simple file to check the time it takes to complete. One test took 2:30 minutes and the rest took 90 and 29 seconds.

Something striking is the performance of this class. Can you help me track the source?

Test:

#include "MyVector.h"

const unsigned int SIZE_V= 1000000;
const unsigned int RUNS= 10000;

int main() {

      MyVector v(SIZE_V);

      for (unsigned int j=0; j<RUNS; ++j) {
        for (unsigned int i=0; i<SIZE_V; ++i) {
          v[i]= i;
        }
      }

      return 0;
}

      

Class:

MyVector.h:

#ifndef MY_VECTOR_H
#define MY_VECTOR_H

class MyVector {

 public:

      MyVector(unsigned int size);
      ~MyVector();

      int& operator[](unsigned int i);

 private:
      int* _data;
      unsigned int _size;
      MyVector(const MyVector&);
      MyVector& operator=(const MyVector&);

};
#endif

      

MyVector.cpp:

#include "MyVector.h"
#include <assert.h>

MyVector::MyVector(unsigned int size) : _data(new int[size]) {
}

MyVector::~MyVector() {
      delete[] _data;
}

int& MyVector::operator[](unsigned int i) {
      assert(i<_size);
      return _data[i];
}

      

EDIT:

These are the test results:

granularity: each sample hit covers 4 byte(s) for 0.04% of 27.09 seconds

index % time    self  children    called     name
                                                 <spontaneous>
[1]    100.0   12.51   14.58                 main [1]
               11.28    0.00 1410065408/1410065408     MyVector::operator[](unsigned int) [2]
                3.31    0.00       1/1           MyVector::~MyVector() [3]
                0.00    0.00       1/1           MyVector::MyVector(unsigned int) [7]
-----------------------------------------------
               11.28    0.00 1410065408/1410065408     main [1]
[2]     41.6   11.28    0.00 1410065408         MyVector::operator[](unsigned int) [2]
-----------------------------------------------
                3.31    0.00       1/1           main [1]
[3]     12.2    3.31    0.00       1         MyVector::~MyVector() [3]
-----------------------------------------------
                0.00    0.00       1/1           main [1]
[7]      0.0    0.00    0.00       1         MyVector::MyVector(unsigned int) [7]
-----------------------------------------------

      

+3


source to share


3 answers


One thing you might want to do is operator[]

inline. When I do this, the performance of your code on my box improves threefold from

real    0m18.270s

      

to

real    0m6.030s

      



In the last test, each iteration of the test loop takes about 0.6 ns (!) Or about 1.5 clock cycles.

This is in the Sandy Bridge window using g ++ 4.7.2 with -O3

.

PS There is an error in the code: the constructor does not initialize _size

, therefore it assert()

has undefined behavior.

+2


source


  • Measurement without starting the profiler.

  • Measure fully optimized code: g++ -O3



+1


source


You write: -

1000000 * 10000 * 4 * 8 = 320000000000

      

data bits in general, in tests that: -

2.5 mins = 2133333333 bits / sec = ~2,000 MB/s

90 secs = 3555555555 bits / sec = ~3,400 MB/s

30 secs = 10666666666 bits / sec = ~10,000 MB/s

      

The maximum data transfer rate of DDR2 is from 3200 MB / s to 8,533 MB / s, and the peak data rate of DDR3 is from 6400 MB / s to 17.066 MB / s /

Based on this, I would say you have DDR3-1600 chips.

0


source







All Articles