# LQR and the golden ratio

13 May 2018There are many related formulations of LQR,
but we will focus on two: finite horizon and infinite horizon.
The optimal feedback law for the finite horizon problem is time-dependent,
whereas for the infinite horizon problem, it is time-independent.
To save memory and computation, people often take the easier to compute
infinite horizon optimal controller (steady-state controller) and apply it
to a finite horizon problem.
How much performance does one lose by doing so?
To be fair, however, we should compare the steady-state controller
not to the time-varying optimal LQR controller but to the
optimal *time-invariant* LQR controller for a finite horizon problem.
Of course, time-varying controller will work even better,
but we want to avoid having a separate gain matrix for each time step
(this is why we are using the steady-state controller in the first place).

### Dynamics

We do not need to solve the problem in full generality to develop working intuition. Consider the following 1D discrete-time dynamical system \begin{equation*} x_0 = 1, \quad x_{n+1} = x_n + u_n. \end{equation*} Assume that we want to find a parameterized controller in the form $u_n = kx_n$. Substituting it into the dynamics, we can write \begin{equation*} x_{n} = (1 + k)^n. \end{equation*} Can you guess the optimal $k$ that minimizes a quadratic cost for this dynamics?

### Cost

To evaluate the quality of the controller, we use the standard LQR objective \begin{equation*} J_N = \sum_{n=0}^N (x_n^2 + u_n^2), \end{equation*} which upon substitution of the controller and the dynamics becomes \begin{equation*} J_N(k) = (1 + k^2) \sum_{n=0}^N (1 + k)^{2n}. \end{equation*} Assuming $k \in (-2, 0)$, we can find the steady-state cost by summing the geometric series \begin{equation*} J_\infty(k) = \lim_{N \to \infty} J_N(k) = \frac{1 + k^2}{1 - (1+k)^2} = \frac{k^2 + 1}{-k(k + 2)}. \end{equation*}

### Suboptimality

We want to compare optimal $k$’s and $J$’s for different values of $N$. Let $k_N$ denote the optimal gain with respect to the objective $J_N$. The main question of this post is how big is the gap in performance between the optimal infinite horizon controller $k_\infty$ and the optimal finite horizon controller $k_N$ when compared on a finite horizon problem. Thus, we need to find the suboptimality gap \begin{equation*} \delta_N = J_N(k_\infty) - J_N(k_N). \end{equation*} We can easily find $k_\infty$ by analytically minimizing $J_\infty$, and of course we can find $k_N$ in the same way for any fixed N. But unfortunately, I don’t see an easy way to express $k_N$ as a function of $N$. Therefore, we have to be satisfied with finding $\delta_N$ numerically for a couple of $N$’s to get a feeling of how the horizon length influences the performance.

### Golden ratio

The minimum of $J_\infty(k)$ in the range $k \in (-2, 0)$ lies at $k_\infty$ with \begin{equation*} k_\infty = \frac{1-\sqrt{5}}{2}, \quad J_\infty(k_\infty) = \frac{1+\sqrt{5}}{2}. \end{equation*} Wow! Did you expect the golden ratio to be the optimal cost for this LQR problem? We are not the first to spot the deep connection between LQR and Fibonacci sequences.

### Comparison

Sadly, real beauty appears only in the limit. For finite $N$, I got this table for you \begin{align*} k_0 &= 0 & J_0(k_0) &= 1 & J_0(k_\infty) &\approx 1.38 & \delta_0 &\approx 0.38 \newline k_1 &= -0.5 & J_1(k_1) &\approx 1.56 & J_1(k_\infty) &\approx 1.58 & \delta_1 &\approx 0.02 \newline k_2 &\approx -0.6 & J_2(k_2) &\approx 1.612 & J_2(k_\infty) &\approx 1.613 & \delta_2 &\approx 0.001 \newline k_\infty &\approx -0.618 & J_\infty(k_\infty) &\approx 1.618 & J_\infty(k_\infty) &\approx 1.618 & \delta_\infty &= 0 \end{align*} The dynamics of the optimal gain $k_N$ and the optimal cost $J_N(k_N)$ are truly remarkable. They start at $(k,J) = (0, 1)$ for $N=0$ and then proceed each towards the nearest root of the golden ratio equation as $N$ goes to infinity. What concerns the suboptimality gap, it decreases very rapidly; it would be interesting to derive the precise rate of convergence, but let’s leave it for another day.

### Conclusion

Although the numbers are specific to this toy example, the main message is more broadly valid: the difference in performance between the steady-state LQR controller and the optimal time-invariant finite horizon controller goes to zero very quickly because with every time step we are adding a term to a geometric series. Given that it is very easy to find the steady-state controller, there seems to be no reason to optimize a time-invariant linear feedback controller on a finite-horizon problem. That is something worth considering in RL too.