Wednesday, July 29, 2009

Willman I Gradient

I've spent the last two weeks messing around with finding the best fit slope for Willman I's velocity distribution along its major axis.

The best fit has a slope of -.484 km/s/arcminute.

But I wanted to figure out how likely this is, so ran a couple of simulations. Based on Willman I's mean velocity, position, and spread in spatial points, I simulated 100,000 fake galaxies with specified velocity dispersions.

With a velocity dispersion of 1 km/s, an absolute slope of greater than .484 is never (or almost never) found.

A velocity dispersion of 5 km/s has a slope higher than that value 17.01% of the time and a velocity dispersion of 10km/s has a greater slope 48.75% of the time.

The velocity dispersion of Willman I is 3.52, and just for kicks, the probability of it having that slope with that dispersion is 5.40%.

Wednesday, July 15, 2009

Results! (sort of)

Well this week I've actually accomplished several things.
I sent out a paper to several of Beth's collaborators detailing my results for the kurtosis of the various dwarf galaxies' velocity distributions.
For the most part, their small star numbers mean that their negative kurtosis falls in line with the simulated results for a normal distribution with the same number of stars.

However, Canes Venatici II has a random statistically significant high kurtosis, which is probably due to its having several far off outliers (making thicker tails, which result in higher kurtosis, remember?).
But for the rest of them we'll need more stars to determine anything else through that method.

But I also decided that there is almost definitely some rotation going on in Willman I. I plotted the systematic velocities in various sectors of the data, which seem to show that the "left" side is moving towards us faster than the average velocity, and the "right" side is relatively moving away from us. When I chopped out the stars with high metallicity, the probable outliers, I saw a really apparent gradient in the systematic velocity distribution. It's pretty neat!

It's probably evidence for streaming motions, although I'm still not entirely sure what that means, other than the fact that it is a result of tidal disruption.

To Do:
1) Beautify my kurtosis/skewness plots per Josh's suggestion.
2) Figure out what's going on with Nmix/correspond with Jay.
3) Look for rotation in the other galaxies.
4) Take a nap! The midnight Harry Potter viewing was probably not the brightest idea, despite being unexpectedly funny and entertaining.

Wednesday, July 8, 2009

Longer Time No Blog

It's been a while, but I've basically been doing the same things or variants on them.
I made a bunch of sweet figures of simulated data and its kurtosis, skewness, and Nmix results. I then plotted real data to see if they matched up at all, which wasn't really the case. But they're still potentially useful figures.

The main problem was that my simulations show that kurtosis is rather strongly negative for simulated Gaussians with small numbers of stars. But most of our data sets have small numbers of stars, so a negative kurtosis won't really tell us anything.

Also, I randomly discovered that running Nmix on the Segue I data set with and without the 5 outlier points yields drastically different results. With the points, the probability of a single component is close to 70%, without them it is only 4%. Something is clearly wrong with Nmix.

But now I'm trying to determine if there are "streaming motions" in the Willman I kinematics, which basically mean rotation. I'll divide the data into chunks based on position and then calculate the average velocity, seeing if some are negative or positive which would indicate spinning. I'm not really sure what that would mean, but since it's not a spiral galaxy, it would probably mean definite tidal disruption.

Well that's all for now and hopefully I'll keep up the blogging.

Monday, June 29, 2009

Long Time No Blog

I seem to have slowed down after my initial daily and even twice daily blog posts. So now I'll try to summarize what I've been doing for the past week.

I'm pretty sure I spent most of the time simulating perfect Gaussians and then running Nmix on the data to find the single component confidence level. I then repeated this 100 times and found the average value. I then repeated the whole process for a different number of stars to see just how many stars are necessary for Nmix to be reasonably accurate. The results weren't too satisfying, as it seems that we need around 175 for Nmix to have 70% or so single component confidence.

This is unfortunate, as most of the samples that I have seen only have 20-60 stars, way under the limit wherein Nmix can actually determine that there is only a single component.

I then repeated this, adding in some randomly generated error to the simulated Gaussian, to see if this changed its ability to detect shape. This had practically no effect on the Nmix results, which makes some sort of sense.

On Wednesday, Marla Geha was visiting Beth and she sent me the velocity distributions for 8 other dwarf galaxies (after several reminders). I'll do something with them once I finish with the simulated data.

My newest project was to simulate a sample composed of two Gaussians to see if Nmix actually works the way it should in detecting multiple structures. I placed the two Gaussians one standard deviation apart. This time it was even less accurate than with the single Gaussian at finding the correct structure. I managed to stupidly delete a bunch of that data, but there was only about 40% confidence for 2 components with under 200 stars. With 1000 stars there was on average about 70% confidence, which indicates that it did work. But we don't have nearly that many stars in our samples so it's pretty discouraging.

I also started testing the kurtosis of various sampled data and the actual samples and will be attempting to figure out how to effectively use that in the near future (or the next time the power doesn't randomly shut off in the middle of my program).


To Do:
1) Add kurtosis to double Gaussian test
2) Read Simon and Geha 2007
3) Do the ratio stuff for double Gaussian

Monday, June 22, 2009

Jackknife has too many k's in a row for one word.

Since my last blog post, I have spent my time modifying my bootstrap code to instead do jackknifes.

Jackknifes are somewhat similar to bootstraps in that they test modified data sets, but the jackknife alteration is simply to remove a single data point, test the statistic in question, then put it back and remove another, and repeat until all of the data points have been removed. This gives a good estimate of the variance of the data, because it can show how much a single data point affects the results.

I'll find out more about jackknifing and bootstrapping once I actually get that book from the Astro library (probably tomorrow).

My jackknifing code made a bunch of pretty plots showing the Nmix probability of a certain component for each test. It's pretty interesting that the single-component probability seems to fluctuate a lot for the different jackknifes but quickly converges at 2 components and beyond.

But an alarming discovery was that the simulated normal sample with added uncertainty sometimes appeared to be strongly bimodal. So I decided to test the power of Nmix with a bunch of simulated pure normal populations of varying star number.

Theoretically Nmix should work better with more stars (as more data points make the Normal curve much more visible), so I'll see if that pans out as expected.

To Do:
1) Find the mean and standard deviation of the kurtosis of the jackknifed samples
2) LEARN
3) According to Wikipedia, there is also a kick called a jackknife:

4) Figure out how to do that.

Thursday, June 18, 2009

Rage Against the Machine.

Well the first thing that I learned today was that to do a good bootstrapping estimation, you must bootstrap n*(log n)^2 times. Then you can find a good estimate of the true population variance as well as the variance of the statistic in question.

Apparently the book I was reading about bootstrapping online is in the Astro Library. And the print version hopefully won't have the Google Book Preview's random missing pages. So I'll head over there and check it out as soon as it stops raining, which is starting to seem like never.

The rest of the day was spent obnoxiously debugging my code, which randomly refused to make a FITS file of the data I wanted.

Wednesday, June 17, 2009

Giving Bootes II the Boot

Yesterday I slightly refined my code to also create a summary .pe file. This file also includes the k's, but includes the weights and means and variances of the predicted components. Thus when it detects two populations it is pretty easy to see if the second Gaussian has a small fraction of the total number of stars. If it does, it's probably caused by an outlier and shouldn't be viewed as significant.

Most of today was spent attempting to extract the actual Bootes II data from the data set. This was complicated by the fact that (as I realized after some time) some of the data points were duplicates, and some were just ridiculous and shouldn't even be counted. But with Beth's help I was finally able to narrow the 243 stars down to the 21 that are actually supposed to be in the galaxy. I then did my typical bootstrap/Nmix analysis.

But first I ran Nmix on the actual data, with the following results:
Bootes II Nmix Results
1) 21.3% confidence in 1 component, with mean of -122.9
2) 18.9% confidence in a 2 component fit, with means of -128.5 and -114.58
3) 14.2% confidence in 3 components
4) 10.4% confidence in 4 components

So these results are rather ambiguous, with no really obvious number of components. But that is probably a result of the small sample size of 21 stars. This is the least confident of all of the samples I have tested so far.

Bootes II Bootstrap Nmix Results
1) Only 15% had above 10% confidence for a single population, 5% above 20%
2) 25% of both the 2 and 3 component fits had a confidence above 10%
3) 10% had a 2 component fit with confidence above 20%, 5% of the 3's
4) However, in one of the 2 component best-fits where there was no chance of a single peak, 83% were in one of the peaks. Not sure where to make the cutoff that says the other peak is insignificant.

Overall, this was ridiculously inconclusive, but 2 components seems to win out.

To Do:
1) Learn more about bootstrapping/jack-knifing
2) Make a code that will read the Nmix data, then determine if the multi-component fit is significant or not based on the relative weights
3) Kurtosis