(3) Markov Decision Processes (Value Iteration and/or Policy Iteration) 0.3,1 0.9.1 0.2,5 A B 063 A 8 0.7.1 0.1.1 0.8.1

Post by **answerhappygod** » Sun Oct 03, 2021 11:07 am

: 3 Markov Decision Processes Value Iteration And Or Policy Iteration 0 3 1 0 9 1 0 2 5 A B 063 A 8 0 7 1 0 1 1 0 8 1 1 (101.32 KiB) Viewed 33 times

(3) Markov Decision Processes (Value Iteration and/or Policy Iteration) 0.3,1 0.9.1 0.2,5 A B 063 A 8 0.7.1 0.1.1 0.8.1 0.4,- 0.4,3 06.0 0.3.2 0.8,4 с c Action P Action a To avoid cluttering the diagram, I have drawn the arrows of Action P and Action Q on separate diagrams in the assignments we drew the green and black arrows on the same figure). The probabilities and the rewards are given with each of the arrows. (a) Take the discount rate , and let the initial values of all the three states equal 0. Perform one iteration of value iteration, and update the values of the three states. (b) Take the initial policy as action P from state A, and action Q from states B and C. Take the same discount rate, and initial values as part (a). Perform one iteration of policy iteration --- that is, compute three equations in the three unknowns under this policy. (don't solve the equations. You don't even have to simplify the equations). (c) Assume that you got the values: , and (I am just making these values up, and didn't solve the above equations). Assuming these values, compute the next policy. (To compute the next policy, you need to compute the best action from each of the states A, B and C, based on the values computed in part (b). To save time, just compute the best action from state A only).