(a) (2 points) When computing the reward for an MDP, what does a higher discount factor mean? (b) (2 points) What is the
Posted: Sun May 15, 2022 10:01 am
(a) (2 points) When computing the reward for an MDP, what does a higher discount factor mean? (b) (2 points) What is the difference between the V function (VH) and the Q function (Q+) of a policy ? (c) (2 points) Although policy evaluation takes less time than value iteration, what can be an advantage of using value iteration over policy evaluation? Consider the n state MDP below. In the state n, there is just one action that collects the reward of +10. For every other state there are two actions: right which moves determinis- tically one step to the next state, or reset which moves back to state 1. There is a reward factor of +1 for right and 0 for reset. The discount factor is y = +10 -n-1 2 +1 n +1 +1 (a) (2 points) What is the optimal policy Topt? (e) (2 points) What is the optimal value of state n (Vope(n))? (f) (4 points) Compute the optimal value function, Vopt(k) for all k = 1,...,n-1, in terms of Vopt(n). (g) (6 points) Suppose you are doing value iteration to compute the values V(k) for all k <n. You start with all value estimates equal to 0. Show the values for all k after 1 and 2 iterations respectively.