1. Consider that an agent is moving in the Gridworld shown in Figure 1. The only action available in cells A, F, G, and
-
- Site Admin
- Posts: 899559
- Joined: Mon Aug 02, 2021 8:13 am
1. Consider that an agent is moving in the Gridworld shown in Figure 1. The only action available in cells A, F, G, and
Question 1 If the agent is in cell B or C, and it takes a Left or Right action, it might fail sometimes. The agent will move in the desired direction with probability p, and it will fail and move up with probability 1- p. If the agent is in any other cell (A, D, E, F, G, or H), the action will always be successful. Assume that the agent plays optimally, there is no living reward/penalty, and the discount factor is γε [0, 1]. a) [C01, PO1] For each of the following policies, determine the value of each non-terminal state. (Exit, if S is a Terminal State i. Tright (S) = Right, otherwise Exit, if S is a Terminal State ii. Left(S) = Left, otherwise b) [CO3, PO2, PO3] For what range of value for p will the agent choose Tleft over Tiright? c) (CO3, PO2, PO3] For what range of value for p will the agent choose Tright over T Left?
1. Consider that an agent is moving in the Gridworld shown in Figure 1. The only action available in cells A, F, G, and His Exit with rewards 2X, X, 0, and 0, respectively. Here, X = The last digit of your student ID +1. When the agent exits from a cell, it gets the reward for that cell as specified. From the other cells, the agent can only take the action Left or Right, which results in the agent moving to the immediate left or right cell, respectively. G H A B C D EF Figure 1: Gridworld for