Optimizing neighbor counting function for conway game of life in C

Some problems have an optimization function that returns the number of neighbors of a cell in the Conway Game of Life implementation. I'm trying to learn C and just getting better at coding. I am not very good at potential optimization and I spent a lot of time reading the various methods online, but it really is not for me yet.

Specifically, I'm trying to figure out how to unwrap this nested loop in the most efficient way, but every time I try, I just make the runtime longer. I'm including the function, I don't think any other context is needed. Thanks for any advice you can give!

Here is the function code countNeighbors()

:

static int countNeighbors(board b, int x, int y)
{
   int n = 0;

   int x_left = max(0, x-1);
   int x_right = min(HEIGHT, x+2);
   int y_left = max(0, y-1);
   int y_right = min(WIDTH, y+2);

   int xx, yy;
   for (xx = x_left; xx < x_right; ++xx) {
       for (yy = y_left; yy < y_right; ++yy) {
           n += b[xx][yy];
       }
   }

   return n - b[x][y];
}

      

+3


source to share


1 answer


Instead of declaring the fee as, b[WIDTH][HEIGHT]

declare it as b[WIDTH + 2][HEIGHT + 2]

. This gives additional headroom to have zeros, but it prevents the index from being exceeded. So instead of:

 x x
 x x

      

We'll have:

 0 0 0 0 
 0 x x 0
 0 x x 0
 0 0 0 0 

      

x

indicates used cells, 0

will not be used.

Typical tradeoff: bit of memory for speed.

Thanks to this, we do not need to call functions min

and max

(which are bad for performance operators if

).

Finally, I would write your function like this:

int countNeighborsFast(board b, int x, int y)
{
    int n = 0;
    n += b[x-1][y-1];
    n += b[x][y-1];
    n += b[x+1][y-1];
    n += b[x-1][y];
    n += b[x+1][y];
    n += b[x-1][y+1];
    n += b[x][y+1];
    n += b[x+1][y+1];
    return n;
}

      

Benchmark (updated)

Complete working source code .

Thanks to Jongware's comment, I added linearization ( downsizing the array from 2 to 1) and changing int

to char

.

I also made the main loop linear and calculated the returned sum directly, without an intermediate variable n

.

2D array was 10002 x 10002, 1D had 100040004 elements.



The processor I have has a Pentium Dual-Core T4500 at 2.30 GHz, more details here (output cat /prof/cpuinfo

).

Results at the default optimization level O0

:

Original: 15.50s
Mine: 10.13s
Linear: 2.51s
LinearAndChars: 2.48s
LinearAndCharsAndLinearLoop: 2.32s
LinearAndCharsAndLinearLoopAndSum: 1.53s

      

This is about 10 times faster than the original version.

Results on O2

:

Original: 6.42s
Mine: 4.17s
Linear: 0.55s
LinearAndChars: 0.53s
LinearAndCharsAndLinearLoop: 0.42s
LinearAndCharsAndLinearLoopAndSum: 0.44s

      

About 15 times.

On O3

:

Original: 10.44s
Mine: 1.47s
Linear: 0.26s
LinearAndChars: 0.26s
LinearAndCharsAndLinearLoop: 0.25s
LinearAndCharsAndLinearLoopAndSum: 0.24s

      

Approximately 44 times.

The latest version LinearAndCharsAndLinearLoopAndSum

:

typedef char board3[(HEIGHT + 2) * (WIDTH + 2)];

int i;
for (i = WIDTH + 3; i <= (WIDTH + 2) * (HEIGHT + 1) - 2; i++)
    countNeighborsLinearAndCharsAndLinearLoopAndSum(b3, i);

int countNeighborsLinearAndCharsAndLinearLoopAndSum(board3 b, int pos)
{
    return
    b[pos - 1 - (WIDTH + 2)] +
    b[pos     - (WIDTH + 2)] +
    b[pos + 1 - (WIDTH + 2)] +
    b[pos - 1] +
    b[pos + 1] +
    b[pos - 1 + (WIDTH + 2)] +
    b[pos     + (WIDTH + 2)] +
    b[pos + 1 + (WIDTH + 2)];
}

      

Changing 1 + (WIDTH + 2)

to WIDTH + 3

won't help, because the compiler will take care of it anyway (even at the optimization level O0

).

+5


source







All Articles