3. (2pts) we will formulate Tic-Tac-Toe as an environment in which we can train a reinforcement learning agent. You will
Posted: Thu Jul 14, 2022 2:18 pm
3. (2pts) we will formulate Tic-Tac-Toe as an environment in which we can train a reinforcement learning agent. You will play as X's, and your opponent will be O's. Two-player games such as Tic-Tac-Toe are often modeled using game theory, in which we try and predict the moves of our opponent as well. For simplicity, we ignore the modeling of the opponent moves and treat our opponent's actions as a source of randomness within the environment. Assume you always go first. What are the states and actions within the Tic-Tac-Toe reinforcement learning environment? How does the current state affect the actions you can take?