Tuesday, May 9, 2006

CSL-1 and confidence levels

When it was announced that CSL-1 was not a cosmic string in January, I forgot to link the preprint:
What I would like to be discussed is the confidence level. As you may remember, the spectra of the two components of CSL-1 were compared and argued to be identical at a confidence level exceeding 99.9% (page 3 here).

How do you interpret this number and what are the lessons? Should you interpret it as the probability that the two components were images of the same galaxy? Then it would indeed be quite shocking that CSL-1 turned out not to be a cosmic string.

My understanding is that the number 99.9% was calculated from the correlation between the spectra - treated as functions of the frequency. If these two functions were random, the confidence level would be very low. Because these two functions were much more correlated than two random functions of the frequency, they obtained a number that was so close to 100%. Please write down the exact formula - or AlGorithm - how 99.9% was calculated - if you know it.

But it seems to me that the actual procedure that led to the 99.9% figure did not care about the "generic" correlation between the spectra of two galaxies. It is probably very usual that two galaxies, especially two nearby galaxies, have a very similar spectrum. They have typically very similar abundance of all elements and their isotopes - or, more or less equivalently, the same percentage of stars of different types.

My guess is that if the correlation between these two spectra were compared with the correlation of spectra of other pairs of galaxies, the agreement would look much less spectacular. And this elementary step has not been done, I think. So it seems to me that the figure 99.9% was just a meaningless Bayesian random number whose precise value depends on "priors". And the case of CSL-1 was an example of a highly unrealistic prior.

Incidentally, PRIOR were malls in the socialist Czechoslovakia. It stands for "Přijdeš rychle i odejdeš rychle": you come quickly and you leave quickly. ;-) For example, in the late 1980s, one could not even buy a toilet paper. Once PRIOR was bought by companies such as Tesco or K-Mart, these problems completely evaporated. Capitalism works, socialism doesn't.

Was it correct to think that the agreement of the spectra meant that the probability that the components were images of the same galaxy exceeeded 99.9%? I think it is accurate to say today that the probability that such a prediction was a correct conclusion was below 0.1%. ;-)

OK, let me assume that it was not a correct estimate of the "subjective" probability that the components were truly identical. Was there a better algorithm that Sazhin et al. could have used to get a more sober prediction of the probability that the images are completely identical, based on the spectral analysis?

I am afraid that the answer is No - unless we know everything about the distribution of all galaxies, their spectra, and all of their correlations, including the dependence of the correlation on the proximity and other quantities. More realistically, I am afraid that even the most elementary procedure was not made - the impressive correlation between the spectra was not compared with the correlation of the spectra of other pairs of nearby galaxies. Correct me if I am wrong.

Bayesian estimates of probability are never quantitative science. The numbers representing the likelihood always have a subjective meaning. They depend on subjective assumption and temporary knowledge, and they are not subject to objective science. But despite this general statement, the estimates of probabilities can be done more or less rationally, and because we know that CSL-1 was not a cosmic string, it seems that the 99.9% estimate was not made too rationally.

It reminds me of the Bayesian estimates of the space shuttle failures. As Feynman argued after the Challenger disaster, the predictions of an extremely low error rate could have never been justified rationally because they could not have been based on the frequentist understanding of probability.