UP | HOME

confidence interval

Mostly from wikipedia page

1 formal definition

  • We have a dataset of samples \(x_1, ..., x_n\) sampled from random variables \(X_1, ..., X_n\)
  • Let \(\theta\) be some parameter that we are interested in, e.g. \(\mu\)
  • Let \(L(X_1,...,X_n)\) and \(U(X_1,...,X_n)\) be two statistics – note that they are also random variables
  • Let \(0 \leq \gamma \leq 1\) – this is the confidence level
  • Then, \(L\) and \(U\) define a confidence interval for confidence level \(\gamma\) if:

\[ P(U < \theta < L) = \gamma \] for every \(\theta\)

Then, we can compute a specific confidence interval \(U(x_1,...,x_n)\) and \(L(x_1,...,x_n)\).

1.1 commentary

Note that \(\theta\) is not a random variable. \(P(U < \theta < L)\) can be written \(P(U < \theta \wedge L > \theta)\).

\(P(U < \theta < L) \forall \theta\) means that no matter what the parameter is, then for \(\gamma\) fraction of the samples, the confidence intervals will contain the parameter.

Note that this does not mean that, for a particular sampled set, the interval contains the parameter with \(\gamma\) certainty. \(\gamma\) just tells us the reliability of the confidence interval estimation procedure. As far as I can tell this is mostly a philosophical distinction. Neyman says that for a given realized dataset sample, there is no probability that the interval contains the true parameter. It is either a fact that it does or doesn't.

Think of a roulette wheel. The casino knows that the gamblers should win \(\gamma\) % of the time. Suppose that the winning number of a spin is 1. This does not imply that there is a \(\gamma\) chance that the gambler picked 1.

Created: 2021-09-14 Tue 21:44