## Problem Statement

How do you validate a model of a system against a physical system when a controller is necessary to make the system operate and the the operational policies of the controllers were developed independently.

## Discussion

Consider a development process with a well defined operation cost model which drives both the model and physical system to optimization that operations cost. If the model uses full state feedback for control and the physical system uses either full state feedback or implements output feedback with state estimators and relies on certainty equivalence, there is no guarantee that under identical test conditions the trajectories of the cost and the trajectories of the states will be identical even if both system are optimal with respect to the operational costs. This is because optimal operational costs are unique, but there are no guarantees that optimal tratectories are unique. Furthermore, if the controls in both cases are not globally optimal, by only near optimal, then likelihood of non-unique trajectories is even more likely. However, because the operational costs can be unique, the validation exercise can be decomposed into two validation steps.

First, the equations which model the physics can be validated against test data on the physical system by measuring the states in the real system, then substituting the integrator in the model with the state measurements. Ideally, the physical system could execute both policies. The error in costs, and derivative calculations can be compared to quantify the error between the model of the physics and the real physics.

Second, once the errors in the model of the physics are quantified, the error in the costs under the different controllers can be quantified. Ideally, from this step, the optimality of the controls wrt to a globally optimal controller (or minimizing controller) can be established. Once interesting possibility is to use policy improvement to see if the independently developed policies can merged for better performance. Alternatively, if there are unexplained differences, then the constraints respected by the different policies need to be reconciled. Things like robustness may also contribute to differences. Robustness will in general be a driven by different view of noise and risk sensitivity. Following on that, there is the possibility that the equation structure in the different policies lead to different performance limitations. Again, policy improvement may provide a way to identify these structure imposed limitations.

## Wednesday, August 3, 2011

### Random Notes: Thoughts on metrics for engineering tools

• If training is required for successful usage, what is the average half-life of the training. In other words, if a group of users is trained, how long until 50% of the users will forget some key aspect of tool usage which drives them to abandon the tool?
• What is the average time for user to need to go to the help files to complete a task if they do not use the tool constantly?
• Can a user successfully use the tool without training?
• Are the documentation and examples sufficient for self learning?
• How many actions are required to complete a ‘quickstart’ example?
• How many decisions are required to complete a ‘quickstart’ example?
• How many choices are the in each decision in a typical workflow?
• How difficult is it to integrate the tool into automated work flows?
• How difficult is it to customize the tool?
• Can a power user customize the tool?
• How long does it take to introduce a new feature in the tool?
• How many sentences does it take to describe why a user should adopt the tool?
• In the absence of process enforcement, would the users naturally adopt this solution?
• What is the time saving for the individual, team, and organization from the adoption of the tool?
• If the tool reduces error rates, is there feedback to the users to help them understand the improvement?
• Can the input and output to the tool be reused so that the effort can be reapplied?
• What is the ‘activation potential’ to get a new user to adopt the tool?
• In a corporate setting, how difficult are the permissions to manage?
• If a new user if not setup, will the team be able to duplicate the permissions with without calling the developers?

## Deterministic Cost Models

 Description Cost Model Dynamic Programming Equations Restrictions Finite Horizon Total Cost $$J^{\pi}\left(x_{0}\right)=\sum_{k=0}^{K}\alpha^{k}\cdot c_{k}\left(x_{k},\pi\left(x_{k}\right)\right)$$ $$V_{k}^{\pi}\left(x\right)=c_{k}\left(x,\pi\left(x\right)\right)+\alpha\cdot V_{k+1}^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)$$,$$\forall k\in\left\{ 0,\cdots,K-1\right\}$$ $$V_{K}^{\pi}\left(x\right)=c_{K}\left(x,\pi\left(x\right)\right)$$ $$0\leq\alpha<1$$ Infinite Horizon Total Cost $$J^{\pi}\left(x_{0}\right)=\sum_{k=0}^{\infty}\alpha^{k}\cdot c\left(x_{k},\pi\left(x_{k}\right)\right)$$ $$V^{\pi}\left(x\right)=c\left(x,\pi\left(x\right)\right)+\alpha\cdot V^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)$$ $$0\leq\alpha<1$$ Finite Horizon Shortest Path $$J^{\pi}\left(x_{0}\right)=\sum_{k=0}^{K}\alpha^{k}\cdot c_{k}\left(x_{k},\pi\left(x_{k}\right)\right)$$ $$V_{k}^{\pi}\left(x\right)=c_{k}\left(x,\pi\left(x\right)\right)+\alpha\cdot V_{k+1}^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)$$,$$\forall k\in\left\{ 0,\cdots,K-1\right\}$$ $$V_{K}^{\pi}\left(x\right)=c_{K}\left(x,\pi\left(x\right)\right)$$ $$0\leq\alpha\leq1$$ $$\left\{ x\in\chi|c\left(x,\pi\left(x\right)\right)=0\right\} \neq\left\{ \oslash\right\}$$ Infinite Horizon Shortest Path $$J^{\pi}\left(x_{0}\right)=\sum_{k=0}^{\infty}\alpha^{k}\cdot c\left(x_{k},\pi\left(x_{k}\right)\right)$$ $$V^{\pi}\left(x\right)=c\left(x,\pi\left(x\right)\right)+\alpha\cdot V^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)$$ $$0\leq\alpha\leq1$$ $$\left\{ x\in\chi|c\left(x,\pi\left(x\right)\right)=0\right\} \neq\left\{ \oslash\right\}$$ Average Cost $$J^{\pi}\left(x_{0}\right)=\underset{K\rightarrow\infty}{\lim}\frac{1}{K}\sum_{k=0}^{K}\alpha^{k}\cdot c\left(x_{k},\pi\left(x_{k}\right)\right)$$ $$V^{\pi}\left(x\right)+\lambda=c\left(x,\pi\left(x\right)\right)+V^{\pi}\left(f\left(x,\pi\left(x\right)\right)\right)$$ $$0\leq\alpha<1$$ $$V^{\pi}\left(x_{ref}\right)=0$$ for some $$x_{ref}\in\chi$$

## Stochastic Cost Models

 Description Cost Model Dynamic Programming Equations Restrictions Finite Horizon Total Cost $$J^{\pi}\left(x_{0}\right)=E^{W}\left[\sum_{k=0}^{K}\alpha^{k}\cdot c_{k}\left(x_{k},\pi\left(x_{k}\right),w\right)\right]$$ $$V_{k}^{\pi}\left(x\right)=E^{W}\left[c_{k}\left(x,\pi\left(x\right),w\right)+\alpha\cdot V_{k+1}^{\pi}\left(f\left(x,\pi\left(x\right),w\right)\right)\right]$$ $$V_{K}^{\pi}\left(x\right)=E^{W}\left[c_{K}\left(x,\pi\left(x\right)\right)\right]$$ $$0\leq\alpha<1$$ Infinite Horizon Total Cost $$J^{\pi}\left(x_{0}\right)=E^{W}\left[\sum_{k=0}^{\infty}\alpha^{k}\cdot c\left(x_{k},\pi\left(x_{k}\right),w\right)\right]$$ $$V^{\pi}\left(x\right)=E^{W}\left[c\left(x,\pi\left(x\right),w\right)+\alpha\cdot V^{\pi}\left(f\left(x,\pi\left(x\right),w\right)\right)\right]$$ $$0\leq\alpha<1$$ Finite Horizon Shortest Path $$J^{\pi}\left(x_{0}\right)=E^{W}\left[\sum_{k=0}^{K}\alpha^{k}\cdot c_{k}\left(x_{k},\pi\left(x_{k}\right),w\right)\right]$$ $$V_{k}^{\pi}\left(x\right)=E^{W}\left[c_{k}\left(x,\pi\left(x\right),w\right)+\alpha\cdot V_{k+1}^{\pi}\left(f\left(x,\pi\left(x\right),w\right)\right)\right]$$ $$V_{K}^{\pi}\left(x\right)=E^{W}\left[c_{K}\left(x,\pi\left(x\right)\right)\right]$$ $$0\leq\alpha\leq1$$ $$\left\{ x\in\chi|c\left(x,\pi\left(x\right)\right)=0\right\} \neq\left\{ \oslash\right\}$$ Infinite Horizon Shortest Path $$J^{\pi}\left(x_{0}\right)=E^{W}\left[\sum_{k=0}^{\infty}\alpha^{k}\cdot c\left(x_{k},\pi\left(x_{k}\right),w\right)\right]$$ $$V^{\pi}\left(x\right)=E^{W}\left[c\left(x,\pi\left(x\right),w\right)+\alpha\cdot V^{\pi}\left(f\left(x,\pi\left(x\right),w\right)\right)\right]$$ $$0\leq\alpha\leq1$$ $$\left\{ x\in\chi|c\left(x,\pi\left(x\right)\right)=0\right\} \neq\left\{ \oslash\right\}$$ Average Cost $$J^{\pi}\left(x_{0}\right)=E^{W}\left[\underset{K\rightarrow\infty}{\lim}\frac{1}{K}\sum_{k=0}^{K}\alpha^{k}\cdot c\left(x_{k},\pi\left(x_{k}\right),w\right)\right]$$ $$V^{\pi}\left(x\right)+\lambda=E\left[c\left(x,\pi\left(x\right),w\right)+V^{\pi}\left(f\left(x,\pi\left(x\right),w\right)\right)\right]$$ $$0\leq\alpha<1$$ $$V^{\pi}\left(x_{ref}\right)=0$$ for some $$x_{ref}\in\chi$$

## Risk Aware/Averse Stochastic Cost Models

 Description Cost Model Dynamic Programming Equations Restrictions Certainty Equivalence with exponential utility $$J^{\pi}\left(x_{0}\right)=\underset{K\rightarrow\infty}{\limsup}\frac{1}{K}\cdot\frac{1}{\gamma}\cdot\ln\left(E^{W}\left[\exp\left(\sum_{k=0}^{K-1}c\left(x,\pi\left(x\right),w\right)\right)\right]\right)$$ Mean-Variance

## Cost Models That don’t work or have issues

 Description Cost Model Issues Expected exponential disutility $$J^{\pi}\left(x_{0}\right)=\underset{K\rightarrow\infty}{\limsup}\frac{1}{K}\cdot E^{W}\left[\textrm{sgn}\left(\gamma\right)\cdot\exp\left(\gamma\cdot\sum_{k=0}^{K-1}c\left(x,\pi\left(x\right),w\right)\right)\right]$$ Does not discriminate among policies Different version of expected exponential disutility $$J^{\pi}\left(x_{0}\right)=\underset{K\rightarrow\infty}{\limsup}\frac{1}{\gamma}\cdot\log\left(E^{W}\left[\exp\left(\gamma\cdot\frac{\gamma}{K}\sum_{k=0}^{K-1}c\left(x,\pi\left(x\right),w\right)\right)\right]\right)$$ Generally reduces to cost average