Probability distribution vs cumulative distribution function

In this post, I collected definitions of the basic probability theory concepts in the language of measure theory, following Kolmogorov, with a bit of modern terminology and emphasis on intuition behind them.

Random variable

Let $\big(\Omega, \mathcal{A}, \mathbb{P}\big)$ be a given probability space.

A random variable $X$ is a measurable function $$ \begin{equation*} X : \Omega \rightarrow \mathbb{R}. \end{equation*} $$

It means $X^{-1}(B) \in \mathcal{A} \; \forall B \in \mathcal{B}(\mathbb{R})$, where $\mathcal{B}(\mathbb{R})$ is the Borel $\sigma$-algebra on $\mathbb{R}$.

Probability distribution

A random variable $X : \Omega \rightarrow \mathbb{R}$, apart from mapping points from $\Omega$ to $\mathbb{R}$, carries over the measure $\mathbb{P}$ from $\Omega$ to $\mathbb{R}$.

The probability distribution $\mu$ of a random variable $X$ is the push-forward measure $\mu : \mathcal{B}(\mathbb{R}) \rightarrow [0, 1]$ (denoted $\mu = X_* \mathbb{P}$) defined by the relation $$ \begin{equation*} (X_* \mathbb{P})(B) = \mathbb{P}(X^{-1}(B)) \end{equation*} $$ for all $B \in \mathcal{B}(\mathbb{R})$.

The push-forward measure has a nice property

for any random variable $Y : \mathbb{R} \rightarrow \mathbb{R}$ for which any of the integrals exists.

Expectation

The expectation $\mathbb{E} Y$ of a random variable $Y$ is the Lebesgue integral $$ \begin{equation*} \mathbb{E} Y = \int_\mathbb{R} Y \,\mathrm{d} \mu. \end{equation*} $$

You can think of it in the following way. A random variable $X$ pushes some abstract measure $\mathbb{P}$ from $\Omega$ to a measure $\mu$ on $\mathbb{R}$ (to a Gaussian measure, for example). After that we can forget about $\Omega$ altogether since all observable quantities are of the form $\mathbb{E} Y$, where $Y : x \mapsto Y(x)$ is a measurable function on $\big(\mathbb{R}, \mathcal{B}(\mathbb{R}), \mu\big)$, and $\mu$ carries all the information required to compute $\mathbb{E} Y$.

In particular, if $Y$ is the identity function $Y = \mathrm{Id} : x \mapsto x$, the expectation $\mathbb{E} Y$ gives the expected value of $X$. Using the nice property of the push-forward measure $\mu$ and the fact that $(\mathrm{Id} \circ X)(\omega) = \mathrm{Id}(X(\omega)) = X(\omega)$, we obtain

This result allows us to compute the expectation of $X$ if we don’t know $\mathbb{P}$ but know that $X$ is distributed according to $\mu$ (denoted $X \sim \mu$). It is, basically, the idea behind introducing $\Omega$ in the first place. We think of $\Omega$ as some invisible space where someone is throwing dice, while we can only observe consequences of that activity in our space $\mathbb{R}$.

Probability density function

If $\mu$ is absolutely continuous with respect to $\lambda$ (denoted $\mu \ll \lambda$), where $\lambda$ is the Lebesgue measure on $\mathbb{R}$, then there exists the Radon-Nikodym derivative $f$, which allows one to change measure under integral.

The probability density function $f : \mathbb{R} \rightarrow [0, \infty)$ of a random variable $X : \Omega \rightarrow \mathbb{R}$ with distribution $\mu = X_*\mathbb{P}$ is the Radon-Nikodym derivative $f = \mathrm{d} \mu \,/\, \mathrm{d} \lambda$.

With the help of the probability density $f$, we can rewrite the expectation of $Y$

Cumulative distribution function

The cumulative distribution function (CDF) $F : \mathbb{R} \rightarrow [0, 1]$ of a random variable $X$ is defined by $$ \begin{equation*} F(x) = \mu \big( (-\infty, x] \big). \end{equation*} $$

Using the Riemann–Stieltjes integral and the CDF, we can rewrite the expectation in yet another way

or even more explicitly

To answer the question in the title of this post, CDF and probability distribution are closely related concepts, but they are two different things. A probability distribution is a law that assigns a real number to every measurable subset of a given set, while a CDF assigns numbers only to half-open intervals $(-\infty, x]$ in $\mathbb{R}$. Thus, probability distribution is a more general object, as it can be defined on any measurable space, not only on $\mathbb{R}$.

Measure theory in action

Let’s see how the machinery we’ve developed works on a simple example.

Problem. Let $\mu = \mathcal{N}(0, 1)$ and $X \sim \mu$. If $Y = X^2$, what is its distribution $\nu$?

Solution. It is easy to see that $G(x) = \nu \big( (-\infty, x] \big) = \mu \big( \left[-\sqrt{x}, \sqrt{x}\right] \big)$. That already solves the problem, but if desired, we can go further and obtain the density $g$ of $\nu$ by differentiating $G(x)$.

Problem. What is the expected value of $Y$?

Solution. Since we have both $\mu$ and $\nu$, we can compute $\mathbb{E}_\mathbb{P} Y$ in two ways

If that does not impress you, look at what it means for densities

$\mathbb{E}_\mu X^2$ is the variance of $X$, which is given and equals $1$. Therefore, we conclude that the mean of the Chi-squared distribution with one degree of freedom equals the variance of the standard normal distribution.

References

The basics of probability theory are well explained in the lecture notes on Stochastic Calculus, Filtering, and Stochastic Control by Ramon van Handel and in Probability and Stochastic Processes with Applications by Oliver Knill.