Page 1 of 1

Problem 2. Answer the following questions and show your work for questions b. and c. a. In value iteration, let k be the

Posted: Sat May 14, 2022 7:11 pm
by answerhappygod
Problem 2 Answer The Following Questions And Show Your Work For Questions B And C A In Value Iteration Let K Be The 1
Problem 2 Answer The Following Questions And Show Your Work For Questions B And C A In Value Iteration Let K Be The 1 (720.61 KiB) Viewed 40 times
Problem 2. Answer the following questions and show your work for questions b. and c. a. In value iteration, let k be the iteration index. Write the formula to update Qx(s,a) from R(s,a,s'), T(s,a,s'), VK-1(s'), y, and write the formula to compute Vx(s) from Qx(s,a). lk (s, a) = Vk (s) = b. Consider the MDP with transition model and reward function as given in the table below. Triangle is player MAX. Assume the discount factor y = 1, given that V. (s) = 0 for both states, fill in the values for V1, V2, Q1, Q2 in the figure below. = S a S a S B 1 که بابا M s' A B A A A A A A B В. 1 2 2 3 3 T(s,a,s') R(s,a,s') 0 0 1 0 1 1 0 0 0.5 0 0.5 0 A B A B A B 1 1 2 2 3 3 A B T(s,a,s') R(s,a,s') 0.5 10 0.5 0 1 0 0 0 0.5 2 0.5 4 B B B A B V2 A B Q2 (4.11 (A,2 (A,31 (B, 1 (B.21 (B,36 V A * B Q, (A,11 (A.21 (A,31 (B,11 (B,22 (B,36 A vo . B