sufficient statistic

1 definition

Let \(f(x\mid \theta)\) be a family of distributions parameterized by \(\theta\). And say that \(X=\{x_1,...,x_n\}\) is a set of samples from a given \(f(x\mid \theta)\). Then, \(T(X)\) is a sufficient statistic with respect to \(\theta\) if no other statistic can be computed that would allow us to better estimate \(\theta\). In other words, as far as getting an estimat of \(\theta\) is concerned, we can throw away \(X\) and keep \(T(X)\) only.

Formally, \(T(X)\) is a sufficient statistic if the conditional probability distribution given \(T(X)\), \(P_{X\mid T(X)}(x \mid T(X))\) does not depend on \(\theta\). Then, this means that if we ever try to do maximum likelihood estimation of \(\theta\), once we know \(T(X)\), we don't have to know anything else about the sample – it won't change the conditional probability distribution.

2 Fisher factorization theorem

The function \(T(x)\) is a sufficient statistic if and only if the probability density function \(f_{X}(x)\) can be factorized as: \[ f_{X}(x) = h(x)g(T(x), \theta) \] Again, we see that if we are making a likelihood inference about what \(\theta\) could be, then we only need to pay attention to how \(g(T(x), \theta)\) varies with \(\theta\).

sufficient statistic

1 definition

2 Fisher factorization theorem

3 sources