Today I ran Nmix 20 times on bootstrapped Segue I data.
But again the results were rather discouraging. Running Nmix on the full data set yielded a single population probability of 70%. Yet the bootstrapped samples never came close to that probability, having an average single population probability of 10%:
7/20, 35% had a single population prob. of over 10%
Only 3 of those, 15%, had a prob. of over 20%
Yet the result was still way better than the others, as only 7, 35% had almost no discernable single or double component structure with over 10% prob. And only 3 of those had a probability of 3 components less than 10%.
So 17/20 had at least a decent confidence for up to 3 components, even if the results weren't as overwhelmingly single population as with the actual data.
Next I generated a normally distributed sample of velocities based on the Segue I data and bootstrapped and Nmixed it as I have with the real data sets. The results seemed to show that Nmix usually works pretty well. The average confidence level for 1 component was 40%, which is way higher than that for the actual data. This of course makes sense, as it is a true Gaussian distribution and should have a high prob. of a single component. But there were still 4/20, or 20% of the time when it didn't seem to detect the single group.
Normalized Data with Added Uncertainty
I then added in the uncertainties (excluding the data point with the ridiculous 147.99 km/s uncertainty) and bootstrapped and Nmixed once more. This time the average confidence level for 1 component was only 22%, yet that is still way higher than what it was for the actual data. Yet it missed the single population (having a confidence of over 10%) 60% of the time. But except for one case (5% of the time) it always detected a 2-component fit with a high probability if the one-component fit didn't work. This really makes me question the effectiveness of Nmix, as its results are either spot-on or a complete miss depending on the bootstrapped sample.
Now looking at the graphs, the times when it detected 2 groups were when there was one point (or 2 points if that point happened to be resampled in the bootstrap) that happened to be located really far away due to its uncertainty. Far enough away to not be considered part of the single population. So I guess Nmix did it's job as well as it could given those outliers.
Measurement uncertainty sucks.