1. Consider that an agent is moving in the Gridworld shown in Figure 1. The only action available in cells A, F, G, and

Business, Finance, Economics, Accounting, Operations Management, Computer Science, Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Algebra, Precalculus, Statistics and Probabilty, Advanced Math, Physics, Chemistry, Biology, Nursing, Psychology, Certifications, Tests, Prep, and more.
Post Reply
answerhappygod
Site Admin
Posts: 899559
Joined: Mon Aug 02, 2021 8:13 am

1. Consider that an agent is moving in the Gridworld shown in Figure 1. The only action available in cells A, F, G, and

Post by answerhappygod »

1 Consider That An Agent Is Moving In The Gridworld Shown In Figure 1 The Only Action Available In Cells A F G And 1
1 Consider That An Agent Is Moving In The Gridworld Shown In Figure 1 The Only Action Available In Cells A F G And 1 (219.36 KiB) Viewed 34 times
1. Consider that an agent is moving in the Gridworld shown in Figure 1. The only action available in cells A, F, G, and His Exit with rewards 2X, X, 0, and 0, respectively. Here, X = The last digit of your student ID +1. When the agent exits from a cell, it gets the reward for that cell as specified. From the other cells, the agent can only take the action Left or Right, which results in the agent moving to the immediate left or right cell, respectively. G H A B C D EF Figure 1: Gridworld for Question 1 If the agent is in cell B or C, and it takes a Left or Right action, it might fail sometimes. The agent will move in the desired direction with probability p, and it will fail and move up with probability 1- p. If the agent is in any other cell (A, D, E, F, G, or H), the action will always be successful. Assume that the agent plays optimally, there is no living reward/penalty, and the discount factor is γε [0, 1]. a) [C01, PO1] For each of the following policies, determine the value of each non-terminal state. (Exit, if S is a Terminal State i. Tright (S) = Right, otherwise Exit, if S is a Terminal State ii. Left(S) = Left, otherwise b) [CO3, PO2, PO3] For what range of value for p will the agent choose Tleft over Tiright? c) (CO3, PO2, PO3] For what range of value for p will the agent choose Tright over T Left?
Join a community of subject matter experts. Register for FREE to view solutions, replies, and use search function. Request answer by replying!
Post Reply