# Bregman divergence of alpha-divergence

16 Apr 2017Many commonly used divergences between probability measures are special cases of the $f$-divergence.

## $f$-Divergence

The measure $\mu$ is allowed to be $\sigma$-finite to include the case $\mu = \lambda$, where $\lambda$ is the Lebesgue measure.

**Example.** If $f(x) = x \log x - (x - 1)$, the corresponding
$f$-divergence is called the **KL divergence** and denoted $D(\pi || \mu)$.
Note that this definition is also valid for unnormalized measures.

## $\alpha$-Divergence

In this section, we introduce a one-parameter family of functions $f_\alpha,\,\alpha \in \R$, that can be used in place of $f$ in the definition of the $f$-divergence.

Notice that $f^\prime = \log$ for $f$ corresponding to the KL divergence.
If we replace $\log$ by the **$\alpha$-logarithm**

we obtain the **$\alpha$-function**

where we choose the constant of integration to fulfill $f_\alpha(1) = 0$.

## Bregman divergence

**Example.** $f_1(x)$ generates $d_{f_1} (y, x) = f_1(\frac{y}{x}) x$.

**Example.** $f_\alpha(x)$ generates
$d_{f_\alpha} (y, x) = f_\alpha(\frac{y}{x}) x^\alpha$.

## Bregman divergence generated by $f$-divergence

For fixed $\mu$ and $f$, the $f$-divergence $D_f(\cdot || \mu)$ is a function of $\pi$,

called the **$f$-divergence from $\mu$**. Since $f$ is non-negative
and convex, $D_f(\cdot || \mu)$ is a valid generator of a Bregman divergence.

### KL generates KL

*Proof.* Substituting $d_{f_1}$ from the example above
into \eqref{eq:bregman_f}, we obtain

The theorem shows that the KL divergence from $\mu$ is a very special
generator of the Bregman divergence: it is a fixed point of the Bregman
transform. To make it more formal, consider the function
$D_f \colon (\mu, \pi) \mapsto D_f(\pi || \mu)$ and the
**Bregman transform**
$d \colon D_f(\cdot || \mu) \mapsto d_{D_f(\cdot || \mu)}$.
The theorem asserts that $d(D(\cdot || \mu)) = D$ for any $\sigma$-finite
measure $\mu$, where $D \triangleq D_{f_1}$ is the KL divergence.

### $\alpha$ Generates $\beta$

Usually $\lambda$ is taken to be the Lebesgue measure. The $\beta$-divergence is not an $f$-divergence.

*Proof.* Substituting $d_{f_\alpha}$ into \eqref{eq:bregman_f}, we obtain

The theorem asserts that the $\beta$-divergence is the Bregman divergence generated by the $\alpha$-divergence. Note that the ground measure $\mu$ plays a role here, in contrast to the case of the KL divergence, for which the Bregman divergence was independent of $\mu$.

### $\beta$ Generates $\beta$

*Proof.* By performing the same manipulations as before, we obtain

Thus, the $\beta$-divergence is stable under the Bregman transform, as the KL divergence.