# Fisher metric vs KL-divergence

16 Oct 2016Let $P$ and $Q$ be probability measures over a set $X$, and let $P$ be absolutely continuous with respect to $Q$. If $\mu$ is any measure on $X$ for which $\displaystyle p = \frac{\mathrm{d}P}{\mathrm{d}\mu}$ and $\displaystyle q = \frac{\mathrm{d}Q}{\mathrm{d}\mu}$ exist, then the Kullback-Leibler divergence from $Q$ to $P$ is given as

Let the density $q = q(x, \theta)$ be parameterized by a vector $\theta$ and let $p$ be a variation of $q$, i.e., $p = q + \delta q$, where $\displaystyle \delta q = \frac{\partial q}{\partial \theta_m} \delta \theta_m$. Then

where we recognize $g_{jk}$, the Fisher information metric,

Thus, the Fisher information metric is the second derivative of the Kullback-Leibler divergence,

### Bonus: one prominent equality for the Fisher information

Letâ€™s prove the following useful equality:

Consider the argument on the right-hand side:

Compute its expectation:

The second term on the right equals zero, which concludes the proof. Derivations in this post closely follow the book by Kullback.