# The $\alpha$-Gaussian

13 Oct 2018It would be a nice story to tell if the maximum $\alpha$-entropy distribution turned out to be the $\alpha$-Gaussian. But it wouldn’t deserve a post on its own, would it?

## Motivation

Do you remember the post where we computed the $\alpha$-divergence between two Gaussian distributions on the real line? Formula (2) derived there had a weird property to only apply if a certain condition (3) on variances is satisfied. What it means, for example, is that one cannot freely pick any two Gaussians and ask what the $2$-divergence between them is, because the $2$-divergence is only defined if the variance of the Gaussian in the numerator is smaller than twice the variance of the Gaussian in the denominator. Back then I could not explain this strange phenomenon; this post is an attempt to gain a better understanding of it.

### The $\alpha$-divergence

The $\alpha$-divergence is a generalization of the KL divergence
obtained by a smooth deformation of the logarithm function.
More concretely, let’s define a new function
$\log_\alpha \colon \mathbb{R} \to \mathbb{R}$ where $\alpha \in \mathbb{R}$,
called the *$\alpha$-logarithm*,
\begin{equation}
\log_{\alpha}(x):=\frac{x^{\alpha-1}-1}{\alpha-1}.
\end{equation}
In the limit $\alpha \to 1$,
this function coincides with the familiar natural logarithm.
Note that $\alpha$ is just a real-valued parameter and not the base;
no confusion with the base should arise
because only the natural logarithm is used here.

The *$\alpha$-divergence* is defined in a similar way as the KL divergence,
\begin{equation}
\text{KL}_{\alpha}(p\|q) := \frac{1}{\alpha} \int_{-\infty}^{\infty}
p(x)\log_{\alpha}\left(\frac{p(x)}{q(x)}\right)dx,
\end{equation}
with the natural logarithm $\log(x)$ replaced by $\alpha^{-1}\log_\alpha(x)$.
For more details on the definition of the $\alpha$-divergence,
see the great article by Andrzej Cichocki and Shun-ichi Amari in Entropy 2010
Families of Alpha- Beta- and Gamma- Divergences:
Flexible and Robust Measures of Similarities.

### The $\alpha$-exponential function

The KL divergence works so beautifully with exponential families because
the logarithm is the inverse of the exponential function.
However, if the logarithm gets replaced by the $\alpha$-logarithm,
such forward-inverse relation no longer holds.
That is why calculating the $\alpha$-divergence between Gaussians
is so cumbersome.
Thus, we are naturally led to introducing the inverse of the $\alpha$-logarithm,
called the *$\alpha$-exponential function*,
\begin{equation}
\exp_{\alpha}(y) := \sqrt[\alpha-1]{1+(\alpha-1)y}.
\end{equation}
The key property which is retained can be stated as
$\exp_{\alpha}(\log_{\alpha}(x)) = x.$
On the other hand, other nice properties are lost,
e.g., $\exp_{\alpha}(x+y) \neq \exp_{\alpha}(x) \exp_{\alpha}(y)$.
Moreover, the domain of the $\alpha$-exponential function depends on $\alpha$;
namely, condition $1+(\alpha-1)y\geq0$ must be fulfilled
for $\exp_{\alpha}(y)$ to return a real number.
The domain restriction property can be seen as a weakness
but also as a strength: it enables definition of
finitely-supported $\alpha$-exponential families.

## The $\alpha$-Gaussian distribution

By replacing the exponential function
in the probability density of the normal distribution
with the $\alpha$-exponential function,
we obtain the *$\alpha$-Gaussian probability density function*
\begin{equation}
N_{\alpha}\left(x|\mu,\sigma^{2}\right):=\frac{1}{c_{\alpha}}
\text{e}_{\alpha}^{-\frac{1}{2}\frac{\left(x-\mu\right)^{2}}{\sigma^{2}}}
\end{equation}
where the normalization constant is given by
\begin{equation}
c_{\alpha}:=\begin{cases}
\sqrt{\frac{2\pi\sigma^{2}}{1-\alpha}}
\frac{\Gamma\left(\frac{1}{2}\frac{1+\alpha}{1-\alpha}\right)}
{\Gamma\left(\frac{1}{1-\alpha}\right)},
& \alpha\in(-1,1],\quad x\in\mathbb{R}, \\
\frac{\sqrt{2\pi\sigma^{2}}}{\left(\alpha-1\right)^{\frac{3}{2}}}
\frac{\Gamma\left(\frac{1}{\alpha-1}\right)}
{\Gamma\left(\frac{1}{2}+\frac{\alpha}{\alpha-1}\right)},
& \alpha>1,\quad x\in\left(\mu-\sqrt{\frac{2}{\alpha-1}},
\mu+\sqrt{\frac{2}{\alpha-1}}\right).
\end{cases}
\end{equation}
Curiously, the support of the $\alpha$-Gaussian density depends on $\alpha$:
for $\alpha > 1$, the support is finite;
for $\alpha \in (-1, 1]$, it is infinite;
and for $\alpha \leq -1$, the support is still infinite but the density
cannot be normalized.

An intuitive way to understand the effect of $\alpha$
is to start at $\alpha \to \infty$
and see how decreasing $\alpha$ towards $\alpha \to -\infty$
changes the probability density function.
For very large $\alpha \to \infty$,
the density $N_\alpha\left(x|\mu,\sigma^2\right)$
is entirely concentrated around the mean $\mu$;
the variance $\sigma^2$ controls the shape of the density
but does not affect its support.
When we decrease $\alpha$ towards $\alpha = 1$, the density starts spreading
out, growing towards the normal density, which it matches in the limit
$\alpha \to 1$.
If we keep decreasing $\alpha$, the tails of the $\alpha$-Gaussian density
function become heavier, enveloping the Gaussian tails;
at $\alpha \to -1$, the tails become so heavy that the probability mass
becomes infinite.
From that point on, decreasing $\alpha$ further makes the function look
more and more like a constant function,
matching it in the limit $\alpha \to -\infty$.
Thus, *by changing $\alpha$ we control the tails of the distribution.*

For completeness, we can obtain the cumulative distribution function \begin{equation} F_{\alpha}(x):=\frac{c_{\alpha}}{2}+x\,_{2}F_{1} \left( \frac{1}{2},\frac{1}{1-\alpha};\frac{3}{2}; -\frac{\left(1-\alpha\right)}{2}x^{2} \right) \end{equation} where $\,_{2}F_{1}$ is the hypergeometric function. One could use it, for example, to sample from the $\alpha$-Gaussian distribution via inverse transform sampling; however, inverting it is not straightforward.

## $\text{KL}_\alpha$ between $\alpha$-Gaussians

The motivation behind introducing the $\alpha$-Gaussian distribution was to simplify the computation of the $\alpha$-divergence. Unfortunately, since the $\alpha$-exponential function is not a homomorphism between addition and multiplication, one cannot nicely transform the ratio of densities into a single exponential which could then be removed by a logarithm, the way it is done in the KL divergence. Therefore, computing $\text{KL}_\alpha$ between $\alpha$-Gaussians is actually not easier than computing $\text{KL}_\alpha$ between usual Gaussians. Moreover, even computing the normal $\text{KL}$ divergence between $\alpha$-Gaussians analytically appears to be infeasible.

## Max $\alpha$-entropy distribution

Getting back to the question posed in the very beginning. Is the $\alpha$-Gaussian the maximum $\alpha$-entropy distribution, similar to how the Gaussian distribution is the maximum entropy distribution? The answer is unfortunately no, and furthermore, it seems to be intractable to analytically derive any $\alpha$-exponential family from the maximum $\alpha$-entropy principle. The reason lies again in the fact that even though we can write down the density in the form $p_\alpha(x) = \text{e}_{\alpha}^{-\eta-\lambda x-\nu x^{2}}$, we can neither normalize it nor evaluate the moments because $\alpha$-exponentiation doesn’t transform addition into multiplication.

## Conclusion

Alas, the $\alpha$-Gaussian distribution didn’t help us to simplify the computation of the $\alpha$-divergence. However, we have seen that by varying $\alpha$ one can generated either heavy-tailed distributions or light-tailed finitely-supported distribution. The fact that the $2$-Gaussian is finitely-supported gives a hint on why the $2$-KL between Gaussians is finite only on a subset of pairs of distributions but does not provide a direct explanation. Estimating the parameters of the $\alpha$-Gaussian distribution from data using maximum likelihood is probably intractable because the $\alpha$-logarithm does not turn a product into a sum. Sampling also does not appear to be straightforward. With all these disadvantages, the $\alpha$-Gaussian distribution is unlikely to become a useful tool in the statistical toolbox, but is rather doomed to remain a bizarre animal in the information geometry zoo.