IIR filter optimization
Quick question related to IIR filter coefficients. Here is a very typical implementation of a straight-form biquad IIR processor II I found on the internet.
// b0, b1, b2, a1, a2 are filter coefficients
// m1, m2 are the memory locations
// dn is the de-denormal coeff (=1.0e-20f)
void processBiquad(const float* in, float* out, unsigned length)
{
for(unsigned i = 0; i < length; ++i)
{
register float w = in[i] - a1*m1 - a2*m2 + dn;
out[i] = b1*m1 + b2*m2 + b0*w;
m2 = m1; m1 = w;
}
dn = -dn;
}
I understand that the "register" is a bit unnecessary, given how smart modern compilers talk about it. My question is, are there potential performance benefits for storing filter coefficients in separate variables rather than using arrays and dereferencing values? Will the answer to this question depend on the target platform?
i.e.
out[i] = b[1]*m[1] + b[2]*m[2] + b[0]*w;
against
out[i] = b1*m1 + b2*m2 + b0*w;
source to share
It really depends on your compiler and optimization options. Here's my take:
- Any modern compiler will simply ignore it
register
. This is just a hint of a compiler, and modern ones just don't use it. - Access to constant indexes in a loop is usually optimized in optimized compilation. In a sense, using variables or an array, as you have shown, makes no difference.
- Always, always run your tests and look at the generated code for critical sections of code performance.
EDIT: OK, just out of curiosity, I wrote a little program and ended up with "identical" code generated using full optimization with VS2010. Here is what I get inside the loop for the expression in question (exactly the same for both cases):
0128138D fmul dword ptr [eax+0Ch]
01281390 faddp st(1),st
01281392 fld dword ptr [eax+10h]
01281395 fld dword ptr [w]
01281398 fld st(0)
0128139A fmulp st(2),st
0128139C fxch st(2)
0128139E faddp st(1),st
012813A0 fstp dword ptr [ecx+8]
Note that I've added a few lines to output the results to make sure the compiler isn't just optimizing everything. Here is the code:
#include <iostream>
#include <iterator>
#include <algorithm>
class test1
{
float a1, a2, b0, b1, b2;
float dn;
float m1, m2;
public:
void processBiquad(const float* in, float* out, unsigned length)
{
for(unsigned i = 0; i < length; ++i)
{
float w = in[i] - a1*m1 - a2*m2 + dn;
out[i] = b1*m1 + b2*m2 + b0*w;
m2 = m1; m1 = w;
}
dn = -dn;
}
};
class test2
{
float a[2], b[3];
float dn;
float m1, m2;
public:
void processBiquad(const float* in, float* out, unsigned length)
{
for(unsigned i = 0; i < length; ++i)
{
float w = in[i] - a[0]*m1 - a[1]*m2 + dn;
out[i] = b[0]*m1 + b[1]*m2 + b[2]*w;
m2 = m1; m1 = w;
}
dn = -dn;
}
};
int _tmain(int argc, _TCHAR* argv[])
{
test1 t1;
test2 t2;
float a[1000];
float b[1000];
t1.processBiquad(a, b, 1000);
t2.processBiquad(a, b, 1000);
std::copy(b, b+1000, std::ostream_iterator<float>(std::cout, " "));
return 0;
}
source to share