(3) Markov Decision Processes (Value Iteration and/or Policy Iteration) 0.3,1 0.9.1 0.2,5 A B 063 A 8 0.7.1 0.1.1 0.8.1

Business, Finance, Economics, Accounting, Operations Management, Computer Science, Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Algebra, Precalculus, Statistics and Probabilty, Advanced Math, Physics, Chemistry, Biology, Nursing, Psychology, Certifications, Tests, Prep, and more.
Post Reply
answerhappygod
Site Admin
Posts: 899559
Joined: Mon Aug 02, 2021 8:13 am

(3) Markov Decision Processes (Value Iteration and/or Policy Iteration) 0.3,1 0.9.1 0.2,5 A B 063 A 8 0.7.1 0.1.1 0.8.1

Post by answerhappygod »

3 Markov Decision Processes Value Iteration And Or Policy Iteration 0 3 1 0 9 1 0 2 5 A B 063 A 8 0 7 1 0 1 1 0 8 1 1
3 Markov Decision Processes Value Iteration And Or Policy Iteration 0 3 1 0 9 1 0 2 5 A B 063 A 8 0 7 1 0 1 1 0 8 1 1 (101.32 KiB) Viewed 33 times
(3) Markov Decision Processes (Value Iteration and/or Policy Iteration) 0.3,1 0.9.1 0.2,5 A B 063 A 8 0.7.1 0.1.1 0.8.1 0.4,- 0.4,3 06.0 0.3.2 0.8,4 с c Action P Action a To avoid cluttering the diagram, I have drawn the arrows of Action P and Action Q on separate diagrams in the assignments we drew the green and black arrows on the same figure). The probabilities and the rewards are given with each of the arrows. (a) Take the discount rate , and let the initial values of all the three states equal 0. Perform one iteration of value iteration, and update the values of the three states. (b) Take the initial policy as action P from state A, and action Q from states B and C. Take the same discount rate, and initial values as part (a). Perform one iteration of policy iteration --- that is, compute three equations in the three unknowns under this policy. (don't solve the equations. You don't even have to simplify the equations). (c) Assume that you got the values: , and (I am just making these values up, and didn't solve the above equations). Assuming these values, compute the next policy. (To compute the next policy, you need to compute the best action from each of the states A, B and C, based on the values computed in part (b). To save time, just compute the best action from state A only).
Join a community of subject matter experts. Register for FREE to view solutions, replies, and use search function. Request answer by replying!
Post Reply