properties of expectation

From the 6.436 lecture notes here.

Recall that a discrete random variable is a random variable \(X\) on a probability space \((\Omega, \mathcal{F}, \mathbb{P})\) if the range of \(X\) is countable or finite.

1 Definition: expectation

A discrete random variable \(X\) with a PMF \(p_X\) has expectation \[ \mathbb{E}[X] = \sum_{x} x p_X(x) \] whenever the sum is well defined. (Also, remember that this sum is always the countable range of \(X\))

2 Fact: alternative formula

If \(X\) only takes non-negative integer values, then \[ \mathbb{E}[X] = \sum_{n\geq 0}\mathbb{P}(X > n) \]

2.1 proof

\[\begin{align*} \sum_{n\geq 0 } \mathbb{P}(X > n) &= \sum_{n\geq 0} \sum_{i=n}^{\infty} \mathbb{P}(X = i)\\ &=\sum_{n\geq 0} n \mathbb{P}(X=i)\\ &=\mathbb{E}[X] \end{align*}\]

The second line comes from the fact that the term \(\mathbb{P}(X=i)\) appears \(i\) times in the sum.

3 Expectation of a function of a random variable: \(g(X)\)

See Law of the unconscious statistician

4 iterated expectation

5 linearity of expectation

For \(a,b\in \mathbb{R}\), we have \(\mathbb{E}[aX + bY] = a\mathbb{E}[X] + b\mathbb{E}[Y]\), provided the sums are well defined.

6 product of independent r.v.'s

If \(X\) and \(Y\) are independent, then \(\mathbb{E}[XY] = \mathbb{E}[X]\mathbb{E}[Y]\) this is because: \[\begin{align*} \mathbb{E}[XY] &= \sum_{x,y} xy p_{X,Y}(x,y)\\ &= \sum_{x} x p_{X}(x) \sum_{y} yp_{Y}(y) \\ &= \mathbb{E}[X]\mathbb{E}[Y] \end{align*}\]

7 moments

Let \(Y=X^2\), by the law of the unconscious statistician, we have \(\mathbb{E}[Y] = \mathbb{E}[X^2] = \sum_x x^2 p_X(x)\).

\(\mathbb{E}[X^2]\) is called the second moment of \(X\)

The quantity \(\mathbb{E}[(X-\mathbb{E}[X])^r]\) is called the r-th central moment of \(x\).

The second central moment \(\mathbb{E}[(X-\mathbb{E}[X])^2]\) is called the variance of \(x\). See this interesting stack overflow question about why the variance is so often used to measure distance from the mean (as opposed to the absolute value or some other norm). One thing that stood out to me was the fact that the variance is essentially taking the l2 distance between a vector of samples \(X_i\) (in the limit, as the number of samples increases, we will have an expectation) and the vector \(\mu\mathbf{1}\).

The square root of the variance is the standard deviation.

8 definition: conditional expectation

Let \(A\) be an event with \(\mathbb{P}(A)>0\). Let \(X\) be a random variable. Then, the conditional expectation of \(X\) given \(A\) is: \[ \mathbb{E}[X\mid A] = \sum_{x} xp_{X\mid A}(x) \]