UP | HOME

cumulative distribution function

From 6.436 lecture notes here

1 definition: CDF

Let \(X\) be a random variable. Then, the function \(F_X : \mathbb{R} \rightarrow [0,1]\) defined by: \[ F_X(x) = \mathbb{P}(X \leq x) \] is called the cumulative distribution function (CDF) of \(X\)

2 properties

Let \(X\) be a random variable with CDF \(F\). Then \(F\) has the following properties:

  1. (monotonicity) if \(x \leq y\), \(F(x) \leq F(y)\)
  2. (limiting value) \(\lim_{x\rightarrow -\infty} F_X(x) = 0\) and \(\lim_{x\rightarrow \infty} F_X(x) = 1\)
  3. (right continuity) For every \(x\), we have \(\lim_{y \downarrow x} F_{X}(y) = F_X(X)\)

3 getting the probability law from a CDF

It turns out that any function \(F\) that satisfies the above properties is a CDF for some random variable. In fact, for some suitable function \(g : (0,1) \rightarrow \mathbb{R}\), we will have \(X = g(U)\) such that \(F_X = F\)

3.1 Theorem

Let \(F\) be a given distribution function. Consider the probability space \(([0,1], \mathcal{B}, \mathbb{P})\) where \(\mathcal{B}\) is the Borel measure and \(\mathbb{P}\) is the Lebesgue measure (see notes on Lebesgue Measure). Then, there exists a measurable function \(X\), i.e. a random variable, such that \(F_X = F\).

3.1.1 proof

We give a version of the proof where we also assume that \(F\) is continuous and strictly increasing. (More general proof to be found in the linked notes). Then, the range of \(F\) is \((0,1)\) and \(F\) is invertible. We define the uniform random variable \(U(\omega) = \omega\) for all \(\omega\). And now we define our random variable \(X = F^{-1}(\omega)\) for every \(\omega \in (0,1)\). In other words, \(X(\omega) = F^{-1}(U(\omega))\). Basically, \(F^{-1}(x)\) says "this is the \(c\), such that \(\mathbb{P}(\{X \leq c\} = x\)". This is very similar to what happens in inverse transform sampling.

Let's do a quick type check:

  • \(U : [0,1] \rightarrow \mathbb{R}\)
  • \(F : \mathbb{R} \rightarrow [0,1]\) (but only invertible on (0,1))
  • \(X = F^{-1}(U) : (0,1) \rightarrow \mathbb{R}\)

Note that \(F(F^{-1}(\omega)) = \omega\), so that \(F(X) = U\). Since \(F\) is strictly increasing, we have \(X(\omega) \leq x\) if and only if \(F(X(\omega)) \leq F(x)\), that is \(U(\omega) \leq x\). By the way, this also tells us that \(X\) is a valid random variable.

So, for every \(x \in \mathbb{R}\): \[ F_X(x) = \mathbb{P}(\{X \leq x\}) = \mathbb{P}(\{F(X) \leq F(x)\}) = \mathbb{P}(\{U \leq F(x)\}) = F(x) \]

To recap, we now have a random variable \(X\) with a CDF \(F_X = F\).

4 Corollary

It turns out there is a one-to-one correspondence between cumulative distribution functions and the probability laws of random variables. Or, there is a one-to-one correspondence between distribution functions \(F\) and probability measures \(\mathbb{P}\) on \((\mathbb{R}, \mathcal{B})\).

4.1 proof

First, we show that this mapping is surjective: every distribution function \(F\) has a corresponding probability measure. By the above theorem, for any CDF \(F\), we can find a r.v. \(X\) such that \(F_X = F\). But each \(X\) also induces a probability measure \(\mathbb{P}_X\) on \((\mathbb{R}, \mathcal{B})\). And given this measure, we can recover the CDF by defining \(F(c) = \mathbb{P}_X((-\infty, c])\).

Now, we show that this correspondence is injective: different probability measures \(\mathbb{P}_X\) and \(\mathbb{P}_{X'}\) necessarily yield different CDFs. Indeed, for any two \(\mathbb{P}_X\) and \(\mathbb{P}_{X'}\) that coincide on all intervals \((-\infty, c]\), they will be equal for all other Borel measurable sets, because the collection of intervals \((-\infty, c]\) is a generating $p$-system for \(\mathcal{B}\) (see Caratheodory's Extension Theorem).

Created: 2021-09-14 Tue 21:44