bootstrapping (statistics)
From wikipedia page
1 approach
Given a sample from a population, we can make an inference (sample \(\rightarrow\) population) about the population
- ex: "the population mean is the sample mean"
How reliable is that inference? We could assume that sample means are distributed normally and compute confidence intervals by using a gaussian.
Bootstrapping approach: Model the population using the sample and model the inferences (sample \(\rightarrow\) population) using re-sampled inferences (re-sampled \(\rightarrow\) sample).
- So, grab 100 people and bin their heights – use this as a model of the population height distribution
- Draw with replacement from the sample to obtain re-samples
- Now, how often do the confidence intervals for a re-sample contain the sample mean? This is the confidence interval!
- Key: we know the sample mean, so we can directly check the validity of a given inference (re-sampled \(\rightarrow\) sample).