All smooth $f$-divergences are locally the same
22 Jul 2017For discrete distributions $p$ and $q$, the $f$-divergence is defined as
where $f$ is a convex function $f: (0, \infty) \to \mathbb{R}$ satisfying the condition $f(1) = 0$. If $p$ is a variation of $q$, then
Provided $f$ is twice differentiable, we can develop it into Taylor series
and thus approximate the $f$-divergence by a quadratic function
Comparing it with the Fisher metric, we see that it is the same quadratic form scaled by a constant factor $f^{\prime\prime}(1)$.
Note that not all $f$-divergences are locally the same, only the smooth ones. For example, the total variation distance corresponds to
which is not quadratic around $x = 1$.
Special case: $\alpha$-divergence
The $f$-divergence with $f$ having the form
is known as the $\alpha$-divergence. Noting that $f_\alpha$ has the properties $f_\alpha^{\prime\prime}(x) = x^{\alpha-2}$ and $f_\alpha^{\prime\prime}(1) = 1$, we obtain the approximation of the $\alpha$-divergence
which directly generalizes the result of Kullback for the KL divergence and its reverse, corresponding to $\alpha = 1$ and $\alpha = 0$ respectively.
Local approximation is exact for Pearson $\chi^2$ divergence
Pearson $\chi^2$ divergence is the $\alpha$-divergence with $\alpha = 2$, corresponding to the generating function
We established the following quadratic approximation of the $f$-divergence
that is valid for small $dq$. However, if we allow big deviations $dq = p - q$, then we obtain Pearson $\chi^2$ divergence (scaled by $f^{\prime\prime}(1)$),
Thus, Pearson $\chi^2$ divergence is the linear extension of the Fisher metric from a local neighborhood to the whole space. Consequently, local quadratic approximation is exact for Pearson $\chi^2$ divergence, since $f_2^{\prime\prime}(1) = 1$.