Optimizing runtime in a simple vector implementation
I have just started implementing my own vector class and I am testing it with a simple file to check the time it takes to complete. One test took 2:30 minutes and the rest took 90 and 29 seconds.
Something striking is the performance of this class. Can you help me track the source?
Test:
#include "MyVector.h"
const unsigned int SIZE_V= 1000000;
const unsigned int RUNS= 10000;
int main() {
MyVector v(SIZE_V);
for (unsigned int j=0; j<RUNS; ++j) {
for (unsigned int i=0; i<SIZE_V; ++i) {
v[i]= i;
}
}
return 0;
}
Class:
MyVector.h:
#ifndef MY_VECTOR_H
#define MY_VECTOR_H
class MyVector {
public:
MyVector(unsigned int size);
~MyVector();
int& operator[](unsigned int i);
private:
int* _data;
unsigned int _size;
MyVector(const MyVector&);
MyVector& operator=(const MyVector&);
};
#endif
MyVector.cpp:
#include "MyVector.h"
#include <assert.h>
MyVector::MyVector(unsigned int size) : _data(new int[size]) {
}
MyVector::~MyVector() {
delete[] _data;
}
int& MyVector::operator[](unsigned int i) {
assert(i<_size);
return _data[i];
}
EDIT:
These are the test results:
granularity: each sample hit covers 4 byte(s) for 0.04% of 27.09 seconds
index % time self children called name
<spontaneous>
[1] 100.0 12.51 14.58 main [1]
11.28 0.00 1410065408/1410065408 MyVector::operator[](unsigned int) [2]
3.31 0.00 1/1 MyVector::~MyVector() [3]
0.00 0.00 1/1 MyVector::MyVector(unsigned int) [7]
-----------------------------------------------
11.28 0.00 1410065408/1410065408 main [1]
[2] 41.6 11.28 0.00 1410065408 MyVector::operator[](unsigned int) [2]
-----------------------------------------------
3.31 0.00 1/1 main [1]
[3] 12.2 3.31 0.00 1 MyVector::~MyVector() [3]
-----------------------------------------------
0.00 0.00 1/1 main [1]
[7] 0.0 0.00 0.00 1 MyVector::MyVector(unsigned int) [7]
-----------------------------------------------
source to share
One thing you might want to do is operator[]
inline. When I do this, the performance of your code on my box improves threefold from
real 0m18.270s
to
real 0m6.030s
In the last test, each iteration of the test loop takes about 0.6 ns (!) Or about 1.5 clock cycles.
This is in the Sandy Bridge window using g ++ 4.7.2 with -O3
.
PS There is an error in the code: the constructor does not initialize _size
, therefore it assert()
has undefined behavior.
source to share
You write: -
1000000 * 10000 * 4 * 8 = 320000000000
data bits in general, in tests that: -
2.5 mins = 2133333333 bits / sec = ~2,000 MB/s
90 secs = 3555555555 bits / sec = ~3,400 MB/s
30 secs = 10666666666 bits / sec = ~10,000 MB/s
The maximum data transfer rate of DDR2 is from 3200 MB / s to 8,533 MB / s, and the peak data rate of DDR3 is from 6400 MB / s to 17.066 MB / s /
Based on this, I would say you have DDR3-1600 chips.
source to share