Sunday, July 31, 2011

Notes on Policy Improvement and Controlled Dynamic Systems

See introductory background.

A controlled dynamic system has inputs which can steer the evolution of the state of the system.

 

\(x_{k+1}=f\left(x_{k},u_{k}\right)\)

(1)

The inputs to the dynamic system can be determined by a policy, \(\pi\( that maps the state of a dynamic system to an input of the dynamic system. This policy makes the controlled dynamic system behave like an autonomous dynamic system.

 

\(x_{k+1}=f\left(x_{k},\pi\left(x_{k}\right)\right)=\widetilde{f}\left(x_{k}\right)\)

(2)

Given a cost of operation for the dynamic system,

 

\(J\left(x_{0}\right)=\sum_{k=0}^{\infty}\left(\alpha^{k}\cdot c\left(x_{k}\right)\right)\),

(3)

a value function which is a function of the control policy and the initial state can be found using a variation of dynamic programming. This value function is

 

\(V^{\pi}\left(x,\pi\left(x\right)\right)=c\left(x,\right)+\alpha^{k}\cdot V^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)\)

(4)

One engineering challenge with a controlled dynamic system is optimizing its performance. Policy improvement provides some insight into how to incrementally improve a policy. The key idea in policy improvement, is that if a change can be made in the policy that improves the immediate and future operational costs, then this change improves the policy. If

 

\(c\left(x,u\right)+\alpha^{k}\cdot V^{\pi}\left(f\left(x,u\right)\right)\leq V^{\pi}\left(x\right)\)

(5)

then the choice of \(u\) at \(x\) is an improvement on the policy \(\pi\) and will reduce the operating costs.

Other key ideas:

This work is licensed under a Creative Commons Attribution By license.

No comments:

Post a Comment