Get the best selection from MiniMax algorithm in Tic-Tac-Toe

I am trying to implement the MiniMax algorithm based on Wikipedia pseudocode in a Tic-Tac-Toe game written in C. However, I cannot get the best possible movement. Here's my code:

#include <stdio.h>
#include <stdbool.h>
#include <string.h>

// compile and run: gcc minimax.c -std=c99 && ./a.out

int max(int x, int y) {
  return x > y ? x : y;
}

int min(int x, int y) {
  return x < y ? x : y;
}

int whoWon(char ch) {
  switch (ch) {
    case 'O':
      return -1;
      break;
    case 'X':
      return 1;
      break;
  }
}

void printArray(char array[]) {
  printf("# START\n"
         "%c | %c | %c\n"
         "--|---|--\n"
         "%c | %c | %c\n"
         "--|---|--\n"
         "%c | %c | %c\n"
         "# END\n\n", array[0], array[1], array[2], array[3], array[4], array[5], array[6], array[7], array[8]);
}

int anyWinners(char board[])
{
  int i;

  /* check every row */
  for(i = 0; i < 7; i += 3)
    if(board[i] != ' ' && board[i] == board[i+1] && board[i] == board[i+2])
      return whoWon(board[i]);

  /* check every column */
  for(i = 0; i < 3; i++)
    if(board[i] != ' ' && board[i] == board[i+3] && board[i] == board[i+6])
      return whoWon(board[i]);

  /* check diagonals */
  if(board[4] != ' ' && ((board[0] == board[4] && board[0] == board[8]) || (board[2] == board[4] && board[2] == board[6])))
    return whoWon(board[4]);

  return 0;
}

int fullBoard(char board[]) {
  for (int i = 0; i < 9; ++i) {
    if (board[i] == ' ')
      return 0;
  }
  return 1;
}

int minimax(char node[], int depth, bool maximizingPlayer, int * move) {
  int terminalNode = anyWinners(node);
  if (depth == 0 || terminalNode || fullBoard(node)) {
    printf("################## END OF SUBTREE ##################\n");
    return terminalNode;
  }

  int bestValue, val;
  if (maximizingPlayer) {
    bestValue = -2;
    for (int i = 0; i < 9; ++i) {
      if (node[i] == ' ') {
        char child[9];
        strcpy(child, node);
        child[i] = 'X';

        // debug
        printArray(child);

        val = minimax(child, depth - 1, false, move);

        // debug
        printf("X: ^^ i = %d ^^ depth = %d ^^ val = %d\n", i, depth, val);

        //bestValue = max(bestValue, val);
        if (val > bestValue) {
          bestValue = val;
          if (depth == 9) *move = i;
        }
      }
    }
    return bestValue;
  } else {
    bestValue = 2;
    for (int i = 0; i < 9; ++i) {
      if (node[i] == ' ') {
        char child[9];
        strcpy(child, node);
        child[i] = 'O';

        // debug
        printArray(child);

        val = minimax(child, depth - 1, true, move);

        // debug
        printf("O: ^^ i = %d ^^ depth = %d ^^ val = %d\n", i, depth, val);

        bestValue = min(bestValue, val);
      }
    }
    return bestValue;
  }
}

int main() {
  int move = -999; // initialize only for debug

  // X will always win no matter what, first best move for X is 8
  // char board[] = {'O', ' ', ' ',
  //                 ' ', ' ', ' ',
  //                 'X', 'X', ' '};

  // best move for X is 3
  char board[] = {'O', 'O', ' ',
                  ' ', 'X', 'X',
                  ' ', ' ', ' '};

  // Initial call for maximizing player
  int result = minimax(board, 9, true, &move);
  printf("minimax returned: %d\n", result);
  printf("chosen move: %d\n", move);

  return 0;
}

      

Coded PCB for each movement with the state of all variables. There are also two failing tests, mostly commented on. Now the algorithm is returning unsuccessful moves and I cannot find the error.

+3


source to share


1 answer


I see two problems:

  • The heuristic is incorrect.
  • There is a problem with strcpy.

The heuristic is incorrect

The Wikipedia pseudocode says:

if depth = 0 or node is a terminal node
    return the heuristic value of node

      

Your implementation does this:

if depth = 0 or node is a terminal node
    return 1 if X wins, -1 if O wins, 0 if it is a draw

      

But this is not a very good heuristic. With this heuristic, all possible paths that X can win are equally weighted. Therefore, if X finds a way to win in 3 moves, it is weighed in the same way as if X found a way to win in 2 moves, and it is weighed in the same way as if X found a way to win in 1 move.

So, here's what happens in your test case:

  • X tries to execute position 2.
  • O tries to execute position 3.
  • X tries to execute position 6.
  • This is the terminal node. X wins. So return positive 1.

Heuristic for this solution path = 1



Another possible possibility:

  • X tries to execute position 3.
  • This is the terminal node. X wins. So return positive 1.

Heuristic for this solution path = 1



Since both of these solutions have the same heuristic, they are both equal to each other. You probably assumed that this solution is suboptimal because it took too many moves to win. I propose a heuristic based on the number of moves it takes to get here, multiplying who is the winner. So, if X wins in 1 move, the heuristic is 5000. If X wins in 2 moves, then the heuristic is 2500. If O wins in 2 moves, the heuristic is -2500. Something like that.

The problem with strcpy

This line:

strcpy(child, node);

      

it should be:

memcpy(child, node, 9*sizeof(char));

      

Because "node" is not a nobody's terminated string. When I run this on VS2013 / Windows 8.1, my output is garbage because. You might get lucky on your platform.

+2


source







All Articles