2. (6pts) Consider the following environment represented as a directed graph, in which circles represent states, double circles represent the goal, and arrows represent actions. Assume you start at state 0 . Assume all actions are deterministic. Transitioning to state 1 produces a reward of 0 , while transitioning to state 2 produces a reward of 1.
- 2-a What are the optimal state-values and state-action-values for this environment? - 2-b What is the optimal policy for this environment? - 2-c Assume we introduce a discount factor of 0.95 into our value functions. Determine the new values of the state-value and state-action-value functions as well as the new optimal policy. Describe the effect of the discount factor on the optimal policy.
2. (6pts) Consider the following environment represented as a directed graph, in which circles represent states, double
-
answerhappygod
- Site Admin
- Posts: 899604
- Joined: Mon Aug 02, 2021 8:13 am
2. (6pts) Consider the following environment represented as a directed graph, in which circles represent states, double
Join a community of subject matter experts. Register for FREE to view solutions, replies, and use search function. Request answer by replying!