Thursday, August 07, 2003

Information

Let's define I(E) = -log2 P(E), where E is an event. This is the amount of information contained in (imparted by) that event, or in other words the amount of uncertainty removed by that event, and is measured in bits. (Why uncertainty? Let {E*} be the set of possible relevant events, of which E is a member. Before E, any of the {E*} elements could have occured; the fact that we obtained E decreased that uncertainty. P(E) is, of course, 1 / the number of elements in {E*}.)

Using the value we determined earlier, the "cutoff value" of 10^-150, the information contained in an event E with that probability is -log2 10^-150, which is 500 bits. Therefore, another way of specifying the "point of no return" is this: anything with an informational content larger than 500 bits could not have occured without the intervention of an intelligence.

What are possible sources for this information? Well, as far as I know (please let me know if you find another one), only 3 exist: intelligence, randomness, and laws (deterministic functions). Of these 3, the last one doesn't actually create information (more on this below), but can be used to move it from one place to another. (Eg, a program drawing a rembrandt painting doesn't create the information: it copies it from whatever storage source it is using to the screen or printer.)

Why can't deterministic functions create information? Well, let's see:

P(A & B) = P(A) x P(B) -- where A and B are independent
I(A & B) = I(A) + I(B) -- this follows from I(A) = -log2 P(A)

However, we're interested in information produced by functions. That is, what is I(A & f(A)). Well, f(A) is definitely not independent from A, so we will use I(B | A) for "the (conditional) information of B, given A". From this, we obtain that

I(A & B) = I(A) + I(B | A) -- note that, for independent A and B, I(B | A) = I(B), so we get back to the previous formula

I(A & f(A)) = I(A) + I(f(A) | A)

What is I(f(A) | A)? It will be helpful to get back to probabilities:

I(f(A) | A) = -log2 P(f(A) | A)

But f is a deterministic function - that is, f(A) is completely determined by A; given A, f(A) has a probability of 1:

P(f(A) | A) = 1

which means that

I(f(A) | A) = 0

Getting back to our initial problem,

I(A & f(A)) = I(A) + I(f(A) | A) = I(A)

In other words, we obtained that deterministic functions add no information. (This is common knowledge in the field of cryptography, another hobby of mine, where it's expressed as "functions cannot create entropy".)



We have now an interesting result. Functions cannot add information. Randomness can, but no more than 500 bits. So the only conclusion we can draw is this: any piece of information more than 500 bits long (or, any event with a probability lower than 10^-150) must have an intelligent cause.



It is important that we keep thinking about the issue in terms of both information and probability, or we might get confused. For example, someone objected that it is possible to accumulate any amount of information "piece-by-piece", one bit at a time. None of the individual bits is outside the realm of randomness, but together they can break the 500 bits barrier. Or, it could be objected that a random process - like flipping a coin - can easily generate 500 random bits. We need to get back to the first part of this (Statistics) and talk about probabilities again. (The answer to both objections seems to me obvious; however, if you're of a different opinion, email me and I'll detail what I mean.)

No comments: