Statistics


I met this gem on the TrueOrigin list:

I asked her to pick a digit (0 through 9). Then do it again, and again, 53 times. You will have picked a number with 53 digits. The apriori probability that you would have picked that number is 1 chance in 10^53, but you did the impossible.

Is it true that the chance of doing what is described above is indeed very small (10^-53)? Let's see.

In my country, we have a lottery called "6 out of 49". You have a 7x7 grid, with numbers from 1 to 49, out of which you are supposed to pick the winning six. Let's say you bought a ticket and picked 6 numbers at random. What is the chance of winning the lottery?

Well, the total number of combinations is C(49, 6), which is 49! / (6! x (49 - 6)!), where "n!" (read: n factorial) means "1 x 2 x 3 x ... x n". Calculating it gives us 13,983,816 possibilities (if I made a mistake, please let me know). So, the probability of picking the right combination is approximately 1 / 14 million, or 7.15 x 10^-8.

Now, what is the probability that you picked 6 random numbers? Put this way, the question sounds silly: the probability is 1, of course. You just did it, it's a past event - it either happened or it didn't, so the probability is either 1 or 0. And this is what the above "experiment" is all about: picking 53 random numbers, and then asking "how probable is that you in fact picked 53 random numbers?", can only have one answer: the probability is 1.

How do we distinguish between the two cases? We need what is called a specification. The second case had one such specification: the winning combination. In other words, we must be able to ask the question: what is the probability that the sequence we just picked (call it x) is equal to T? We can do that in the second case; but in the first, we have nothing to replace T with - except for x, in which case the question becomes "what is the probability that x = x?". Once we have a specification, we can calculate the probability. Once we have that, we can determine whether the event was random or not: anything below a specific value (we'll use 10^-150) cannot be random. (More on this value later.)

What conditions must the specification meet? Well, only one: it must be independent. To borrow an example from Dembski, an archer that shoots at a wall, then paints a bullseye around the arrow is not a very good archer. (The probability of hitting the target - the probability of x being equal to T, to use the symbols above - is 1, since T is defined by x. The two are anything but independent.) On the other hand, if the bullseye is already painted, and the archer hits it 20 times right in the center, then we can be quite sure this was no random event. (I am, of course, ignoring various way to "rig" this, like a bullseye 20 meters across, with the archer standing two feet before it, and other variations.)

Should the specification occur before the event? No, this is not necessary. A process generating 3141592 will hint that it's not random. A process generating the first 100 digits of pi will absolutely tip us off. An independent specification is enough to calculate probabilities, whether it's made known before or after the fact.

Can we make mistakes? In other words, is it possible to consider that an event is random when it isn't, or the other way around?

1. We might not realize that there is a specification. A process generating 2030481... might seem random, until we realize that each digit is the corresponding digit of pi, minus one. Or, some message might be encrypted with a good algorithm, and the resulting string would appear to be random to all statistics tests (like diehard). False negatives are a possibility.

2. On the other hand, I don't see any way for false positives. If the "cutoff value" is low enough - like the 10^-150 I'm using (which I borrowed from Dembski) - there is no way it can happen randomly (that is, without an intelligence involved). [I am not a subscriber to the QM "anything possible is necessary" many-worlds interpretation.] Of course, with a cutoff value too large - like 1 in 10 - false positives can happen. This is why Dembski picked 10^-150, to avoid any possibility of a false positive.

Finally, why 10^-150?

It is estimated that there are about 10^80 particles in the known universe.
The Planck time is the smallest meaningful unit of time: 10^-45 seconds. This means that nothing physical can change state more than 10^45 times per second.
Finally, the universe is estimated to be more than a billion times younger than 10^25 seconds.

Multiplying these figures, 10^80 x 10^45 x 10^25 gives 10^150. This means that, if all the particles in the known universe, for a billion times longer than they (allegedly [1]) existed, "tried" to generate new combinations, they wouldn't have exhausted 10^150 such combinations. That is, there are simply not enough resources in this universe to account for an event with a probability smaller than 10^-150. Which means that, if we encounter an event whose probability (reminder: we need an independent specification to be able to calculate this probability) is smaller than 10^-150, an intelligence must have been involved.

In other words, if P(E) < 10^-150, E was caused by an intelligence.

Whew!



[1] I'm an young-earth creationist. I believe the universe to be less than 10,000 years old.

Comments

Popular posts from this blog

Posting dynamic Master / Detail forms with Knockout

Comparing Excel files, take two

EF Code First: seeding with foreign keys