Risk-averse linearly-solvable control
05 Mar 2018To get into this topic, I highly recommend the paper Efficient computation of optimal actions together with the supplement by Emo Todorov. Here, I will first provide a summary of the derivations from that paper in continuous time with a conventional cumulative cost objective; after that, I will repeat the derivations with an exponential objective. The result of this exercise is a couple of formulas that relate costs under optimal controller to costs under uncontrolled dynamics.
Consider controlled diffusion
\begin{equation*} dx = a(x) dt + b(x) (udt + \sigma dw) \end{equation*}
with the generator
\begin{equation*} \newcommand{\J}{\mathcal{J}} \newcommand{\L}{\mathcal{L}} \newcommand{\p}{\partial} \L^u[\cdot](t, x) = (a(x) + b(x)u)\p_x + \frac{1}{2}b^2(x)\sigma^2 \p_x^2 \end{equation*}
and the cost rate
\begin{equation*} c(x, u) = q(x) + \frac{u^2}{2\sigma^2}. \end{equation*}
Additive objective
Trajectory cost
\begin{equation*} \J^{u(\cdot)}(t, x) = \int_t^T c(x(\tau), u(\tau)) d\tau + Q(x(T)) \end{equation*}
Expected cost
\begin{equation*} J^{u(\cdot)}(t, x) = E \left[ \J^{u(\cdot)}(t, x) \; | \; x(t) = x \right] \end{equation*}
Value function
\begin{equation*} v(t, x) = \min_{u(\cdot)} \left\{ J^{u(\cdot)}(t, x) \right\} \end{equation*}
HJB
\begin{equation*} -v_t(t, x) = \min_u \left\{ c(x, u) + \L^u v(t, x) \right\} \end{equation*}
HJB expanded
\begin{equation*} -v_t = \min_u \left\{ q(x) + \frac{u^2}{2\sigma^2} + (a+bu)v_x + \frac{1}{2} b^2 \sigma^2 v_xx \right\} \end{equation*}
Optimal control
\begin{equation*} u(t, x) = -\sigma^2 b(x) v_x (t, x) \end{equation*}
After substituting the optimal controller into HJB
\begin{equation*} -v_t = q + av_x + \frac{1}{2}b^2\sigma^2v_xx - \frac{1}{2}b^2\sigma^2v_x^2 \end{equation*}
Exponential transform
\begin{equation*} z(t, x) = e^{-v(t, x)} \end{equation*}
Linearized HJB
\begin{equation*} -z_t = -qz + \L^0 z \end{equation*}
Feynman-Kac formula
\begin{equation*} z(t, x) = E^0 \left[ e^{-\int_t^T q(x(\tau)) d\tau - Q(x(T))} \;\bigg|\; x(t) = x \right] \end{equation*}
Note the zero attached to $\L$ and $E$; it stands for $u = 0$, i.e., uncontrolled (or passive) dynamics. Thus, the relation
\begin{equation} \newcommand{\opt}{\text{optimal}} \newcommand{\pas}{\text{passive}} \newcommand{\tc}{\text{total cost}} E_{\opt} [ -\tc ] = \log E_{\pas} [ \exp(-\tc) ] \label{add} \end{equation}
allows us to find the value function $v(t, x)$ by simulating passive dynamics; the optimal controller is then proportional to $v_x$ as described above.
Multiplicative objective
Trajectory cost
\begin{equation*} C^{u(\cdot)}(t, x) = \int_t^T c(x(\tau), u(\tau)) d\tau + Q(x(T)) \end{equation*}
Exponential cost
\begin{equation*} \J_\beta^{u(\cdot)}(t, x) = \exp \left( \beta C^{u(\cdot)}(t, x) \right) \end{equation*}
Expected exponential cost
\begin{equation*} J_\beta^{u(\cdot)}(t, x) = E \left[ \J_\beta^{u(\cdot)}(t, x) \;\bigg|\; x(t) = x \right] \end{equation*}
Exponential value function
\begin{equation*} J_\beta(t, x) = \min_{u(\cdot)} \left\{ J_\beta^{u(\cdot)}(t, x) \right\} \end{equation*}
Value function
\begin{equation*} v^\beta(t, x) = \min_{u(\cdot)} \left\{ \beta^{-1} \log E \left[ \exp \left( \beta C^{u(\cdot)}(t, x) \right) \;\bigg|\; x(t) = x \right] \right\} \end{equation*}
HJB
\begin{equation*} -v_t = \min_u \left\{ c + (a+bu)v_x + \frac{1}{2}b^2\sigma^2(v_{xx} + \beta v_x^2) \right\} \end{equation*}
Optimal controller
\begin{equation*} u = -\sigma^2 b v_x \end{equation*}
After substituting the optimal controller into HJB
\begin{equation*} -\p_t v = q + \L^0 v + \frac{\beta-1}{2} b^2 \sigma^2 v_x^2 \end{equation*}
Let $\beta = 1$, then
\begin{equation*} -\p_t v = q + \L^0 v \end{equation*}
By Feynman-Kac, the solution is
\begin{equation*} v(t, x) = E^0 \left[ \int_t^T q(x(\tau)) d\tau + Q(x(T)) \;\bigg|\; x(t) = x \right] \end{equation*}
In other words,
\begin{equation} E_{\pas} [ -\tc ] = \log E_{\opt} [ \exp(-\tc) ] \label{mul} \end{equation}
Comparing Formulas \eqref{add} and \eqref{mul}, we see that labels ‘optimal’ and ‘passive’ switch roles.