
- A 1 Finite State Systems Imperfect State Information Consider A System That At Any Time Can Be In Any One Of A Finite 1 (95.28 KiB) Viewed 33 times
a 1. Finite-State Systems - Imperfect State Information Consider a system that at any time can be in any one of a finite number of states 1, 2, ..., n. When a control u is applied, the system moves from state i to state j with probability Pij(u). The control u is chosen from a finite collection ut, u?..., Following each state transition, an observation is made by the controller. There is a finite number of possible observation outcomes z1,z2, ...,z4. The probability of occurrence of 2*, given that the current state is j and the preceding control was u, is denoted by r;(u,), 0 = 1, ..., 4. (a) Consider the column vector of conditional probabilities Pa = [Pk..., PH] where p=P(Ik = j | 70, ..., Zk, U9, ---, ...U-1), j = 1,...,n and show that it can be updated according to Pk+1 - PI Pij (Uk) r; (Uk, 2k+1) Σ j = 1,..., Es=1 X-1 PI Pis (uk) rs (Uk, zk+1) Write this equation in the compact form [r (Uk, 2k+1)] + [P(UK)' P] Pk+1 r(Uk, zk+1) P(UK) PE where prime denotes transposition and P(uk) is the nx n transition probability matrix with ij th element Pij (uk), r(Uk, 2k+1) is the column vector with j th coordinate r; (Uk, 2x+1), 1
[P(uk)' Ps], is the j th coordinate of the vector P (uk) Pk, [r (uk, zk+1)]* [P (uk)' Pk] is the vector whose j th coordinate is the scalar rj (Uk, 2k+1) [P(uk)' Pk];. (b) Assume that there is a cost for each stage k denoted gx(i, u, j) and associated with the control u and a transition from i to j. There is no terminal cost. Consider the problem of finding a policy that minimizes the sum of expected costs per stage over N periods. Show that the corresponding DP algorithm is given by JN-1 (PN-1) = min PN-GN-1(u), UE{u?....,m} J* (P) = min (PGk(u) UE{u....,} [r(u,0)] [P(u) Pd] +r(4,0)P(u) PxJx+1 r(u,0)P(u)'P -

] @=1 where G (u) is the vector of expected k th stage costs given by 2-1 Puj(u)gx(1, u, j) G&(u) = 2-1 Pnj(u)g:(nu, j) (e) Show that, for all k, Tk when viewed as a function defined on the set of vectors with nonnegative coordinates, is positively homogeneous; that is, Jx (AP) = \J(P) for all x > 0. Use this fact to write the DP algorithm in the simpler form - 9 JK (P) = weregnin, PG2(u ) + Şikt (bru,9). PGx(u) + Ž Jets (krlu0) – [P(wyrz) +1 UE{u,...,} B=1 (d) Show by induction that, for all k, he has the form JK (Pk) = min (Pak..., Pam] where a, ..., como o are some vectors in R. mk