How do you update the Q values ​​for a two player game

For a single player game, the Q-value updates are pretty intuitive. The current state and future state depend on the strategy of one player, but for two players this is not the case. Consider a scenario in which your opponent wins and the game ends. How are Q values ​​updated?

+3


source to share


1 answer


One general approach is to view your opponent as part of the environment, so the state will be defined to include the statement, the position of the opponent. You choose an action and perform it, changing the state. Then the opponent takes his action, changing the state again. Then your agent gets an initial state, which is the result of the previous action and the previous action of the enemy .



So, in the case where in the state s

you perform an action a

, then the opponent acts and ends the game, you must record the transition from s

to the terminal state through a

.

+1


source







All Articles