UP | HOME

Gini impurity

Consider a set \(S\), with samples that belong to \(J\) classes. Imagine drawing a sample at random from this set and classifying it randomly, according to the distribution of classes in this set. What is the probability that you would mis-classify the sample? For a sample of class \(i\), there is \(p_i\) probability that you draw a sample from the \(i\) -th class and \((1-p_i)\) probability that you mis-classify it.

Adding up the probability for all classes, we get the Gini impurity: \[ \sum_{i=1}^{J} p_i(1-p_i) \]

Remember: here \(p_i\) is the proportion of samples from the \(i\) -th class in \(S\)