Probability distribution vs cumulative distribution function
21 Dec 2016In this post, I collected definitions of the basic probability theory concepts in the language of measure theory, following Kolmogorov, with a bit of modern terminology and emphasis on intuition behind them.
Random variable
Let $\big(\Omega, \mathcal{A}, \mathbb{P}\big)$ be a given probability space.
It means $X^{-1}(B) \in \mathcal{A} \; \forall B \in \mathcal{B}(\mathbb{R})$, where $\mathcal{B}(\mathbb{R})$ is the Borel $\sigma$-algebra on $\mathbb{R}$.
Probability distribution
A random variable $X : \Omega \rightarrow \mathbb{R}$, apart from mapping points from $\Omega$ to $\mathbb{R}$, carries over the measure $\mathbb{P}$ from $\Omega$ to $\mathbb{R}$.
The push-forward measure has a nice property
for any random variable $Y : \mathbb{R} \rightarrow \mathbb{R}$ for which any of the integrals exists.
Expectation
You can think of it in the following way. A random variable $X$ pushes some abstract measure $\mathbb{P}$ from $\Omega$ to a measure $\mu$ on $\mathbb{R}$ (to a Gaussian measure, for example). After that we can forget about $\Omega$ altogether since all observable quantities are of the form $\mathbb{E} Y$, where $Y : x \mapsto Y(x)$ is a measurable function on $\big(\mathbb{R}, \mathcal{B}(\mathbb{R}), \mu\big)$, and $\mu$ carries all the information required to compute $\mathbb{E} Y$.
In particular, if $Y$ is the identity function $Y = \mathrm{Id} : x \mapsto x$, the expectation $\mathbb{E} Y$ gives the expected value of $X$. Using the nice property of the push-forward measure $\mu$ and the fact that $(\mathrm{Id} \circ X)(\omega) = \mathrm{Id}(X(\omega)) = X(\omega)$, we obtain
This result allows us to compute the expectation of $X$ if we don’t know $\mathbb{P}$ but know that $X$ is distributed according to $\mu$ (denoted $X \sim \mu$). It is, basically, the idea behind introducing $\Omega$ in the first place. We think of $\Omega$ as some invisible space where someone is throwing dice, while we can only observe consequences of that activity in our space $\mathbb{R}$.
Probability density function
If $\mu$ is absolutely continuous with respect to $\lambda$ (denoted $\mu \ll \lambda$), where $\lambda$ is the Lebesgue measure on $\mathbb{R}$, then there exists the Radon-Nikodym derivative $f$, which allows one to change measure under integral.
With the help of the probability density $f$, we can rewrite the expectation of $Y$
Cumulative distribution function
Using the Riemann–Stieltjes integral and the CDF, we can rewrite the expectation in yet another way
or even more explicitly
To answer the question in the title of this post, CDF and probability distribution are closely related concepts, but they are two different things. A probability distribution is a law that assigns a real number to every measurable subset of a given set, while a CDF assigns numbers only to half-open intervals $(-\infty, x]$ in $\mathbb{R}$. Thus, probability distribution is a more general object, as it can be defined on any measurable space, not only on $\mathbb{R}$.
Measure theory in action
Let’s see how the machinery we’ve developed works on a simple example.
Problem. Let $\mu = \mathcal{N}(0, 1)$ and $X \sim \mu$. If $Y = X^2$, what is its distribution $\nu$?
Solution. It is easy to see that $G(x) = \nu \big( (-\infty, x] \big) = \mu \big( \left[-\sqrt{x}, \sqrt{x}\right] \big)$. That already solves the problem, but if desired, we can go further and obtain the density $g$ of $\nu$ by differentiating $G(x)$.
Problem. What is the expected value of $Y$?
Solution. Since we have both $\mu$ and $\nu$, we can compute $\mathbb{E}_\mathbb{P} Y$ in two ways
If that does not impress you, look at what it means for densities
$\mathbb{E}_\mu X^2$ is the variance of $X$, which is given and equals $1$. Therefore, we conclude that the mean of the Chi-squared distribution with one degree of freedom equals the variance of the standard normal distribution.
References
The basics of probability theory are well explained in the lecture notes on Stochastic Calculus, Filtering, and Stochastic Control by Ramon van Handel and in Probability and Stochastic Processes with Applications by Oliver Knill.