This is based on this paper: On Bayesian Analysis of Mixtures with an Unknown Number of Components

Now I have no idea what a reversible jump MCMC actually does, despite my efforts to learn. But I have learned that they can somehow be used along with Bayesian inference to determine the parameters of a data set.

This method, called Nmix by Peter Green, somehow sweeps through various numbers of components (k in the paper), testing the probability of each k. How it does this, I have no idea, but it involves a "birth-death move." From the probabilities, you can then make an educated guess as to the true number of components.

Unfortunately, it relies on a lot of parameters that I don't understand yet and aren't even defined in the paper. These are things like lambda, delta, and nu, and are referred to as "hyperparameters." According to Wikipedia, these are parameters for the prior distribution, which is the knowledge about the data before any observations.

Defaults for these values are listed, and as I don't really know what they individually mean (other than being the hyperparameter for other variables), I guess I'll just use the defaults. They justify the defaults through some sort of somewhat incomprehensible statistical analysis that I assume they understand, so I'll trust them.

There are some other justifications of the method, saying how after enough sweeps the k should converge no matter the starting values.

Well that seems about it. After lunch I think I'll start messing around with the program to see what it does.

Subscribe to:
Post Comments (Atom)

Does Nmix actually give you an idea of how many components a distribution of data may have been drawn from? Or does Nmix just tell you whether or not it looks like a set of data have been drawn from a single population?

ReplyDelete