The point of the paper is to demonstrate how the KMM (Kayes Mixture Model. I finally figured out what it stands for!) algorithm can be used to detect bimodality (two groups) in a data set.
The user specifies the number of groups they expect and the predicted mean and standard deviation of each. But if the program works as it should , it will converge to the correct values no matter the initial estimates.
The program first finds the best fit single Gaussian for the data, then calculates a likelihood parameter of that fit.
It then attempts to divide all of the data points into the specified number of groups by calculating the probability that each point of data belongs with each group and putting it into the group with the highest probability. This is then repeated until the number of objects in each group becomes constant. Another likelihood parameter is calculated for this new fit, which is then compared with the initial likelihood parameter to get the likelihood ratio test statistic (LRTS). This is an estimate of the improvement in going from a single to a multiple group fit.
But that statistic is not nearly as useful as the "P-value," which is the "probability that the LRTS would be at least as large as the observed value if the null hypothesis were true." For most cases, this is the probability that we would get that same LRTS if we were looking at a true single group Gaussian instead of something with multiple groups. So, the smaller the better. A P-value of .05 means that there is only a 5% chance of getting that LRTS when looking at a single group.
Double Root Residuals (DRR)
This is a fancy method of comparing the data with a specified model distribution. It essentially compares the square root of the number of data points in each bin to the square root of the expected value from the model and then plots the result. It's a little more complicated than that, but that's the basic idea. If the result is greater than 2 at a certain bin, then there is a discrepancy between the model and the data at the 2 sigma (95%) level. This can be used to compare the data to a unimodal model to see if they strongly disagree.
They also use both skewness and kurtosis to measure the Gaussian-ness of the data. Skewness measures the symmetry of the data (skew of 0 is perfectly symmetric, negative skew means that most of the mass is concentrated to the right). Kurtosis measures the "peakedness" of the data. A kurtosis of 0 indicates a pure Gaussian, a positive kurtosis means that the data is pointier than a Gaussian, and negative means that the peak is flatter. Thus data sets with multiple groups typically have negative kurtosis because the central peak is spread out.
Paper about Skewness and Kurtosis
While some of these tests may give false indication of bimodality or unimodality, when they are all combined they are very effective in determining the number of groups in a data set.
The P-value obtained by this method can only be compared to a chi-squared distribution when both groups have the same variance. If they don't, bootstrapping must be done, which involves taking a random sample of the data with replacement and then calculating its statistics. I'm not entirely sure how this works, but I don't think I need to do it.
There is also the problem of determining just how many groups are in the data set. The P-value is a useful guideline to the significance of the value, but for a more accurate value you need to bootstrap again. At the time of the paper's writing (1994) there was no single powerful technique to determine the number of groups in a datset, but it is generally accepted that the model that works well with the least number of groups is the best model. This follow's from Occam's razor.