Consider the following environment: your agent is placed next to a cliff and must get to the goal. The shortest path to

Post by **answerhappygod** » Thu Jul 14, 2022 2:18 pm

: Consider The Following Environment Your Agent Is Placed Next To A Cliff And Must Get To The Goal The Shortest Path To 1 (60.35 KiB) Viewed 35 times

Consider the following environment: your agent is placed next to a cliff and must get to the goal. The shortest path to the goal is to move along the edge of the cliff. There is also a longer path to the goal that requires the agent to first move away from the cliff, and then towards the goal. The reward for reaching the goal is 100 points, and the reward for falling of the cliff is −1000 points. Every move we make incurs a reward of −1. Assume we use an epsilon-greedy policy for exploration. If we would like to learn the shortest path, should we use an on-policy or off-policy algorithm? Explain why. Note: reading chapter 6 of Sutton \& Barto will help you answer this question.