Bregman divergence of alpha-divergence
16 Apr 2017Many commonly used divergences between probability measures are special cases of the $f$-divergence.
$f$-Divergence
The measure $\mu$ is allowed to be $\sigma$-finite to include the case $\mu = \lambda$, where $\lambda$ is the Lebesgue measure.
Example. If $f(x) = x \log x - (x - 1)$, the corresponding $f$-divergence is called the KL divergence and denoted $D(\pi || \mu)$. Note that this definition is also valid for unnormalized measures.
$\alpha$-Divergence
In this section, we introduce a one-parameter family of functions $f_\alpha,\,\alpha \in \R$, that can be used in place of $f$ in the definition of the $f$-divergence.
Notice that $f^\prime = \log$ for $f$ corresponding to the KL divergence. If we replace $\log$ by the $\alpha$-logarithm
we obtain the $\alpha$-function
where we choose the constant of integration to fulfill $f_\alpha(1) = 0$.
Bregman divergence
Example. $f_1(x)$ generates $d_{f_1} (y, x) = f_1(\frac{y}{x}) x$.
Example. $f_\alpha(x)$ generates $d_{f_\alpha} (y, x) = f_\alpha(\frac{y}{x}) x^\alpha$.
Bregman divergence generated by $f$-divergence
For fixed $\mu$ and $f$, the $f$-divergence $D_f(\cdot || \mu)$ is a function of $\pi$,
called the $f$-divergence from $\mu$. Since $f$ is non-negative and convex, $D_f(\cdot || \mu)$ is a valid generator of a Bregman divergence.
KL generates KL
Proof. Substituting $d_{f_1}$ from the example above into \eqref{eq:bregman_f}, we obtain
The theorem shows that the KL divergence from $\mu$ is a very special generator of the Bregman divergence: it is a fixed point of the Bregman transform. To make it more formal, consider the function $D_f \colon (\mu, \pi) \mapsto D_f(\pi || \mu)$ and the Bregman transform $d \colon D_f(\cdot || \mu) \mapsto d_{D_f(\cdot || \mu)}$. The theorem asserts that $d(D(\cdot || \mu)) = D$ for any $\sigma$-finite measure $\mu$, where $D \triangleq D_{f_1}$ is the KL divergence.
$\alpha$ Generates $\beta$
Usually $\lambda$ is taken to be the Lebesgue measure. The $\beta$-divergence is not an $f$-divergence.
Proof. Substituting $d_{f_\alpha}$ into \eqref{eq:bregman_f}, we obtain
The theorem asserts that the $\beta$-divergence is the Bregman divergence generated by the $\alpha$-divergence. Note that the ground measure $\mu$ plays a role here, in contrast to the case of the KL divergence, for which the Bregman divergence was independent of $\mu$.
$\beta$ Generates $\beta$
Proof. By performing the same manipulations as before, we obtain
Thus, the $\beta$-divergence is stable under the Bregman transform, as the KL divergence.