# Recent posts

- 12 Nov 2018 The Chow-Robbins game with an unknown coin — The Chow-Robbins game is simple to describe: toss a fair coin repeatedly and stop whenever you want, receiving as a reward the average number of heads accrued at the time you stop. It is all the more surprising that the expected payoff of this innocently looking game is not known exactly till today. However, efficient numerical methods and tight bounds exist. This note shows how a related problem where the coin bias is not known in advance can be treated in a similar manner using dynamic programming.
- 23 Oct 2018 EVaR vs MaxEnt vs BI vs EM for PS — This note describes four approaches to policy search (PS) which all can be understood as injecting optimism to facilitate exploration.
- 13 Oct 2018 The $\alpha$-Gaussian — The Gaussian distribution is known to be the maximum entropy distribution with fixed first and second moments. What is the distribution that maximizes the $\alpha$-entropy? The $\alpha$-Gaussian?
- 13 May 2018 LQR and the golden ratio — How suboptimal is the steady-state LQR feedback controller when applied for a finite amount of time? The golden ratio will provide an answer.
- 10 May 2018 EM as KL minimization — EM is an algorithm for ML estimation, so it obviously minimizes a KL. Although the interpretation presented here is equivalent to the one given by Neal and Hinton (1998) in terms of free energy, it may be easier to follow for uninitiated.
- 06 May 2018 Forward kinematics by torus slicing — Already the simplest two-link planar robot arm has no unique inverse kinematics. To understand where exactly the non-uniqueness comes from, look into the geometry of the configuration space.
- 03 May 2018 Why is there a minus in the Lagrangian? — In the Lagrangian formulation of classical mechanics, there is a minus sign between the kinetic and the potential energy. On the other hand, in the Hamiltonian formulation there is a plus. Often, in reinforcement learning, we interpret a cost function as a kind of energy to be minimized. So, why do we always add terms and never subtract?
- 03 May 2018 Differentiable line segment vs disk (or circle) collision detection — It is straightforward to check whether a line intersects a disk. It is a bit harder with a line segment and a circle. And it becomes really interesting if you want to express such a condition as a differentiable inequality.
- 30 Mar 2018 Logistic regression as KL minimization — A minute of thought over a glass of wine reveals that the logistic regression objective is the KL divergence between a parametrized data distribution and an empirical one.
- 05 Mar 2018 Risk-averse linearly-solvable control — Risk-averse counterpart to Kappen-Todorov linearly solvable control framework.
- 05 Mar 2018 Feynman-Kac formula — Summary with a view towards applications in controlled diffusion.
- 16 Jan 2018 Linearly constrained relative entropy minimization — Policy improvement by moment matching with minimal novelty.
- 06 Jan 2018 Expectation through CDF — A beautiful formula for computing expectations using CDF instead of PDF.
- 01 Jan 2018 Entropic proximal policy search — A generic algorithm.
- 01 Jan 2018 Discounting in ergodic MDPs — Does discounting make sense when there is no time?
- 30 Dec 2017 Ergodic policy search as a linear program — Summary.
- 23 Nov 2017 Deriving the HJB equation — This post provides an informal derivation of the Hamilton-Jacobi-Bellman equation that does not explicitly rely on Bellman's optimality principle.
- 08 Oct 2017 Smoothing and differentiation — When minimizing a noisy or a highly oscillating function, it is reasonable to smoothen it before computing the derivative. Linear smoothing and differentiation are commutative operators, therefore their order can be switched. Using the reparameterization trick to shift the dependence on the parameters from the function to the smoothing kernel, one can compute the derivative of the smoothened function even if the gradient of the function itself is not available.
- 10 Sep 2017 Gaussian process vs kernel ridge regression — We have seen in the previous post that estimating Gaussian conditional mean is the same as performing linear regression. In this post, we will show that kernel ridge regression plays the role of linear regression for Gaussian processes.
- 08 Sep 2017 Gaussian conditioning vs linear regression — Modeling inputs and outputs as jointly Gaussian and then conditioning on the inputs to predict expected outputs is equivalent to plain linear regression.
- 18 Aug 2017 $\alpha$-Divergence between Gaussians — Deriving the explicit formula for the $\alpha$-divergence between two univariate Gaussians.
- 12 Aug 2017 Change of variables and necessary conditions for optimality — For minimization of a multivariate function, often new variables get introduced to simplify calculations. However, not every change of variables is equally good.
- 10 Aug 2017 TD error vs advantage vs Bellman error — Clarifying the definitions of the TD error, the advantage, and the Bellman error in RL.
- 22 Jul 2017 All smooth $f$-divergences are locally the same — It is widely known that the KL divergence and the reverse KL locally look like the Fisher metric. This result can be generalized to any twice differentiable $f$-divergence.
- 11 Jul 2017 Geodesic distance between probability distributions is not the KL divergence — The Fisher metric allows one to compute the geodesic distance between probability distributions by line integration. Although the KL locally coincides with the Fisher metric, it is not the geodesic distance.
- 08 Jul 2017 The EM algorithm — A brief summary.
- 12 Jun 2017 Inertia tensor under affine change of basis — Inertia tensor defines a well-known bi-linear form—the kinetic energy; hence, its coordinates transform under rotations as those of a bi-linear form. Translations of the basis are described by the parallel axis theorem. By combining rotation and translation, one can express the inertia tensor in any basis given its coordinates in the frame located at the center of mass.
- 29 Apr 2017 Discounted reward vs average reward reinforcement learning — There are two approaches to formulating the goal in infinite-horizon MDPs. One way is to introduce a discount factor and maximize the expected sum of discounted future rewards. The other way is to maximize the expected reward under stationary state distribution assuming ergodicity. It is easy to show that the two approaches are equivalent.
- 23 Apr 2017 KL minimization vs maximum likelihood estimation — Maximum likelihood estimation can be seen as minimization of the KL divergence from a parametric distribution to an empirical distribution.
- 16 Apr 2017 Many faces of entropy — The definition of entropy should be explicit about the base measure. Then information-theoretic concepts such as KL divergence, joint entropy, and mutual information naturally follow.
- 16 Apr 2017 Bregman divergence of alpha-divergence — Bregman divergence of the KL divergence is the KL divergence again. What is Bregman divergence of alpha divergence? Is there another stable point of Bregman divergence?
- 16 Apr 2017 KL between trajectory distributions vs KL between policies — The KL divergence between trajectory distributions nicely decomposes into a sum of KL divergences between policies over time and state.
- 27 Dec 2016 Between line and parabola — How do functions between $y = x$ and $y = x^2$ look like? That's not so straightforward, because the ordinary definition $y = x^\alpha = \exp(\alpha \log x)$ doesn't work for negative $x$.
- 27 Dec 2016 Determinant of exponential and exponent of adjoint — Two notable equalities from Lie theory.
- 23 Dec 2016 Circle definitions — Generative models are similar in spirit to parametric definition of a circle. Discriminative models are akin to definition through constraints. Are there other ways to define a circle?
- 22 Dec 2016 Transition probability function — Definition and properties.
- 21 Dec 2016 Probability distribution vs cumulative distribution function — Several terms in probability theory may make an impression that they define the same thing. One should, however, carefully discern between abstract entities and their representations. Here is a brief summary of the basic probability theory notions which clarifies the distinctions.
- 12 Dec 2016 Tensor powers at the service of humanity — Some funny identities relating tensor powers, direct products, and exponentiation.
- 03 Dec 2016 Types, morphisms, and concepts in C++ — From a set-theoretic point of view.
- 02 Dec 2016 Quaternionic change of basis — Imagine you are given coordinates of a quaternion in a coordinate frame. If you know how to get to that frame from a base frame, what are the coordinates of the quaternion in the base frame?
- 02 Dec 2016 Isometries of the $l_1^n$ space — Isometries of the Euclidean space are well-known: they are reflections, translations, and rotations. What are their analogs for the $l_1^n$ space?
- 01 Dec 2016 Distance between rotations — Engineers are used to think that everything is a vector in a high-dimensional vector space. Rotations, however, are better thought of as Lie groups acting on vector spaces. It is crystally clear what the distance between vectors in a Euclidean space is. Definition of distance between elements of a rotation group, on the other hand, may very well surprise you.
- 01 Dec 2016 Is a Circle a kind of an Ellipse in C++? — Intuition suggests that one should inherit Circle from Ellipse to represent the relationship "is a kind of" between them. However, there are flaws in such reasoning.
- 16 Oct 2016 Fisher metric vs KL-divergence — The Fisher information metric can be viewed as the second derivative of the KL-divergence.
- 21 Aug 2016 CSS styles for blogging — Links to popular CSS frameworks.
- 29 Jul 2016 Jacobian transpose vs pseudoinverse — Understanding differences between the two inverse kinematics algorithms.
- 22 Jun 2016 Tensor product vs direct product vs Cartesian product — All of these are different ways to build complex objects from simple ones.
- 11 Jun 2016 Time value of money for engineers — Why is $1 today worth more than tomorrow? In this post, we'll see how to transform money in time, learn about the concepts of present value (PV) and future value (FV), and create a savings plan for a happy retirement.
- 31 May 2016 Change of basis vs linear transformation — Learn to discern rotation of a vector from rotation of a basis.
- 16 May 2016 Run matplotlib in a virtualenv on Ubuntu 16.04 with different backends — If figures do not show up, rebuild matplotlib from sources.
- 16 May 2016 Jekyll syntax highlighting themes — Beautiful pastel code highlighting for rouge.
- 14 May 2016 Fourier transforms explained — DFT vs DTFT vs FFT. WTF?
- 12 Apr 2016 Recover hard bricked OnePlus One — Reset to the factory state when the phone does not respond to any input.