Fisher metric vs KL-divergence
16 Oct 2016Let $P$ and $Q$ be probability measures over a set $X$, and let $P$ be absolutely continuous with respect to $Q$. If $\mu$ is any measure on $X$ for which $\displaystyle p = \frac{\mathrm{d}P}{\mathrm{d}\mu}$ and $\displaystyle q = \frac{\mathrm{d}Q}{\mathrm{d}\mu}$ exist, then the Kullback-Leibler divergence from $Q$ to $P$ is given as
Let the density $q = q(x, \theta)$ be parameterized by a vector $\theta$ and let $p$ be a variation of $q$, i.e., $p = q + \delta q$, where $\displaystyle \delta q = \frac{\partial q}{\partial \theta_m} \delta \theta_m$. Then
where we recognize $g_{jk}$, the Fisher information metric,
Thus, the Fisher information metric is the second derivative of the Kullback-Leibler divergence,
Bonus: one prominent equality for the Fisher information
Let’s prove the following useful equality:
Consider the argument on the right-hand side:
Compute its expectation:
The second term on the right equals zero, which concludes the proof. Derivations in this post closely follow the book by Kullback.