LQR and the golden ratio

13 May 2018

There are many related formulations of LQR, but we will focus on two: finite horizon and infinite horizon. The optimal feedback law for the finite horizon problem is time-dependent, whereas for the infinite horizon problem, it is time-independent. To save memory and computation, people often take the easier to compute infinite horizon optimal controller (steady-state controller) and apply it to a finite horizon problem. How much performance does one lose by doing so? To be fair, however, we should compare the steady-state controller not to the time-varying optimal LQR controller but to the optimal time-invariant LQR controller for a finite horizon problem. Of course, time-varying controller will work even better, but we want to avoid having a separate gain matrix for each time step (this is why we are using the steady-state controller in the first place).

Dynamics

We do not need to solve the problem in full generality to develop working intuition. Consider the following 1D discrete-time dynamical system \begin{equation*} x_0 = 1, \quad x_{n+1} = x_n + u_n. \end{equation*} Assume that we want to find a parameterized controller in the form $u_n = kx_n$. Substituting it into the dynamics, we can write \begin{equation*} x_{n} = (1 + k)^n. \end{equation*} Can you guess the optimal $k$ that minimizes a quadratic cost for this dynamics?

Cost

To evaluate the quality of the controller, we use the standard LQR objective \begin{equation*} J_N = \sum_{n=0}^N (x_n^2 + u_n^2), \end{equation*} which upon substitution of the controller and the dynamics becomes \begin{equation*} J_N(k) = (1 + k^2) \sum_{n=0}^N (1 + k)^{2n}. \end{equation*} Assuming $k \in (-2, 0)$, we can find the steady-state cost by summing the geometric series \begin{equation*} J_\infty(k) = \lim_{N \to \infty} J_N(k) = \frac{1 + k^2}{1 - (1+k)^2} = \frac{k^2 + 1}{-k(k + 2)}. \end{equation*}

Suboptimality

We want to compare optimal $k$’s and $J$’s for different values of $N$. Let $k_N$ denote the optimal gain with respect to the objective $J_N$. The main question of this post is how big is the gap in performance between the optimal infinite horizon controller $k_\infty$ and the optimal finite horizon controller $k_N$ when compared on a finite horizon problem. Thus, we need to find the suboptimality gap \begin{equation*} \delta_N = J_N(k_\infty) - J_N(k_N). \end{equation*} We can easily find $k_\infty$ by analytically minimizing $J_\infty$, and of course we can find $k_N$ in the same way for any fixed N. But unfortunately, I don’t see an easy way to express $k_N$ as a function of $N$. Therefore, we have to be satisfied with finding $\delta_N$ numerically for a couple of $N$’s to get a feeling of how the horizon length influences the performance.

Golden ratio

The minimum of $J_\infty(k)$ in the range $k \in (-2, 0)$ lies at $k_\infty$ with \begin{equation*} k_\infty = \frac{1-\sqrt{5}}{2}, \quad J_\infty(k_\infty) = \frac{1+\sqrt{5}}{2}. \end{equation*} Wow! Did you expect the golden ratio to be the optimal cost for this LQR problem? We are not the first to spot the deep connection between LQR and Fibonacci sequences.

Comparison

Sadly, real beauty appears only in the limit. For finite $N$, I got this table for you \begin{align*} k_0 &= 0 & J_0(k_0) &= 1 & J_0(k_\infty) &\approx 1.38 & \delta_0 &\approx 0.38 \newline k_1 &= -0.5 & J_1(k_1) &\approx 1.56 & J_1(k_\infty) &\approx 1.58 & \delta_1 &\approx 0.02 \newline k_2 &\approx -0.6 & J_2(k_2) &\approx 1.612 & J_2(k_\infty) &\approx 1.613 & \delta_2 &\approx 0.001 \newline k_\infty &\approx -0.618 & J_\infty(k_\infty) &\approx 1.618 & J_\infty(k_\infty) &\approx 1.618 & \delta_\infty &= 0 \end{align*} The dynamics of the optimal gain $k_N$ and the optimal cost $J_N(k_N)$ are truly remarkable. They start at $(k,J) = (0, 1)$ for $N=0$ and then proceed each towards the nearest root of the golden ratio equation as $N$ goes to infinity. What concerns the suboptimality gap, it decreases very rapidly; it would be interesting to derive the precise rate of convergence, but let’s leave it for another day.

Conclusion

Although the numbers are specific to this toy example, the main message is more broadly valid: the difference in performance between the steady-state LQR controller and the optimal time-invariant finite horizon controller goes to zero very quickly because with every time step we are adding a term to a geometric series. Given that it is very easy to find the steady-state controller, there seems to be no reason to optimize a time-invariant linear feedback controller on a finite-horizon problem. That is something worth considering in RL too.

Boris Belousov