Wednesday, August 20, 2003

Human DNA

Is the probability of obtaining a DNA sequence which codes for a human (ie, a living being capable of interbreeding with humans) by any combination of random processes and deterministic functions (like natural selection) less than 10^-150?

Let's assume we already have the humans' alleged ancestor race, call it apes. Is it possible for random mutations to change an ape DNA into a human DNA?

Let S = {A, C, G, T} the possible values for a codon, and S* a sequence s[1] ... s[n] where s[i] is in S. We define

dist* (a, b) with a, b in S* = the number of point mutations needed to change a into b (or viceversa):

dist* (a[1] ... a[n], b[1] ... b[m]) = sum (i = 1, n, dist (a[i], b[i])) + m - n, with m >= n >= 1

dist (a, b) = {0 if a = b, 1 otherwise}, with a, b in S

In other words, if everything works out perfectly, it takes at least dist* (A, B) point mutations to convert one into the other, where A is a member of the ape DNA set, and B is a member of the human DNA set.

How close are these sets? It is alleged that they are very close - that the resemblance is between 95% and 99%. Out of 3 billion codons, this would give a difference of 30 to 150 million codons. If we go further and claim that only 1 in 10,000 of the DNA is actually relevant - the rest being "junk DNA" - we are left with a relevant difference of only 3 to 15 thousand "letters".

This means that we need between 3 and 15 thousand point mutations to get from apes to humans. How probable is that, using purely random mechanisms? Well, the problem is equivalent to getting from AAA...A to CCC...C, both strings having 3,000 letters, changing one letter at each step into one of the values A, C, G, or T - at random. At each step, the probability of getting the right one is 1/4; since we need 3,000 lucky changes, the probability of getting them through purely random means is 1/4^3,000 = 2^-6,000. This is WAY below 2^-500.

What about the Weasel program? (You can find an example here.) Couldn't such a program get much faster from AAA...A to CCC...C? Yes, of course it could. But the problem with the weasel program - or any similar algorithm - is that it already knows its target. In other words, the weasel program is equivalent to the following one:

print "CCC...C"

Unless someone can convincingly argue that evolution's goal was to transform apes into humans, any algorithm with a predefined target (which can be reduced to the above) is out. This includes more "roundabout" algorithms that try to conceal the fact that they know their target, like:

for i = 1 to 3000
print chr(-i + ord("A") + (2 * i + 6) / 2 - 1)

What about using an evaluation function (for the natural selection part)? Those can serve as probability amplifiers (they lose information, instead of adding new one, but that's ok in this case - we already have a source of information, randomness; we now need something to "guide" that information).

Well, we have two relevant classes of such functions:
1) functions that know the target - and we're back to the above case - and
2) functions that calculate a "fitness", and it so happens that the fitness of the CCC...C string is greatest, while the fitness of the AAA...A string is lowest.

Option 2 is a viable one, provided that someone produces such a function for the real-life case of human DNA. (I know it's easy to do it for our strings.) Can this be done, without using any post-facto rationalization (like "humans are smarter than apes, therefore the function must give them more points")? [Remember, this must work for any pair of ancestor - evolved species. Such a function won't help with bacteria. The No Free Lunch theorems would suggest that there is no such function that works for all cases.]

So, the final conclusion would be: if evolution's goal is not to produce humans, and if there's no general algorithm that nevertheless happens to favor "closer-to-human DNA" over "closer-to-ape DNA", human DNA can't have occurred by chance - even with the help of natural selection.

Thursday, August 14, 2003


For me, the idea of a string having a probability is absurd - I can't parse it at all. Let me clarify this with an example: what is the probability of a chair? Both a chair and a string are objects. Ok, the string is an informational entity, not a physical object. What is the probability of an equation? Neither of these questions make any sense.

Now, if we want it to make sense, we must start to expand the question. What is the probability that a 500-bit string will occur? Still not good enough - out of thin air? In my daily emails? So let's try again: what is the probability of a 500-bit string occuring in the following experiment: "write down 500 bits"? Well, 1 if you do it, 0 if you don't. In NO case is it going to be any other value.

How do you fill the blanks so that "what is the probability of a 500-bit string ..." gives you any other result? Please email me if you found a way.

Thursday, August 07, 2003


Let's define I(E) = -log2 P(E), where E is an event. This is the amount of information contained in (imparted by) that event, or in other words the amount of uncertainty removed by that event, and is measured in bits. (Why uncertainty? Let {E*} be the set of possible relevant events, of which E is a member. Before E, any of the {E*} elements could have occured; the fact that we obtained E decreased that uncertainty. P(E) is, of course, 1 / the number of elements in {E*}.)

Using the value we determined earlier, the "cutoff value" of 10^-150, the information contained in an event E with that probability is -log2 10^-150, which is 500 bits. Therefore, another way of specifying the "point of no return" is this: anything with an informational content larger than 500 bits could not have occured without the intervention of an intelligence.

What are possible sources for this information? Well, as far as I know (please let me know if you find another one), only 3 exist: intelligence, randomness, and laws (deterministic functions). Of these 3, the last one doesn't actually create information (more on this below), but can be used to move it from one place to another. (Eg, a program drawing a rembrandt painting doesn't create the information: it copies it from whatever storage source it is using to the screen or printer.)

Why can't deterministic functions create information? Well, let's see:

P(A & B) = P(A) x P(B) -- where A and B are independent
I(A & B) = I(A) + I(B) -- this follows from I(A) = -log2 P(A)

However, we're interested in information produced by functions. That is, what is I(A & f(A)). Well, f(A) is definitely not independent from A, so we will use I(B | A) for "the (conditional) information of B, given A". From this, we obtain that

I(A & B) = I(A) + I(B | A) -- note that, for independent A and B, I(B | A) = I(B), so we get back to the previous formula

I(A & f(A)) = I(A) + I(f(A) | A)

What is I(f(A) | A)? It will be helpful to get back to probabilities:

I(f(A) | A) = -log2 P(f(A) | A)

But f is a deterministic function - that is, f(A) is completely determined by A; given A, f(A) has a probability of 1:

P(f(A) | A) = 1

which means that

I(f(A) | A) = 0

Getting back to our initial problem,

I(A & f(A)) = I(A) + I(f(A) | A) = I(A)

In other words, we obtained that deterministic functions add no information. (This is common knowledge in the field of cryptography, another hobby of mine, where it's expressed as "functions cannot create entropy".)

We have now an interesting result. Functions cannot add information. Randomness can, but no more than 500 bits. So the only conclusion we can draw is this: any piece of information more than 500 bits long (or, any event with a probability lower than 10^-150) must have an intelligent cause.

It is important that we keep thinking about the issue in terms of both information and probability, or we might get confused. For example, someone objected that it is possible to accumulate any amount of information "piece-by-piece", one bit at a time. None of the individual bits is outside the realm of randomness, but together they can break the 500 bits barrier. Or, it could be objected that a random process - like flipping a coin - can easily generate 500 random bits. We need to get back to the first part of this (Statistics) and talk about probabilities again. (The answer to both objections seems to me obvious; however, if you're of a different opinion, email me and I'll detail what I mean.)


I met this gem on the TrueOrigin list:

I asked her to pick a digit (0 through 9). Then do it again, and again, 53 times. You will have picked a number with 53 digits. The apriori probability that you would have picked that number is 1 chance in 10^53, but you did the impossible.

Is it true that the chance of doing what is described above is indeed very small (10^-53)? Let's see.

In my country, we have a lottery called "6 out of 49". You have a 7x7 grid, with numbers from 1 to 49, out of which you are supposed to pick the winning six. Let's say you bought a ticket and picked 6 numbers at random. What is the chance of winning the lottery?

Well, the total number of combinations is C(49, 6), which is 49! / (6! x (49 - 6)!), where "n!" (read: n factorial) means "1 x 2 x 3 x ... x n". Calculating it gives us 13,983,816 possibilities (if I made a mistake, please let me know). So, the probability of picking the right combination is approximately 1 / 14 million, or 7.15 x 10^-8.

Now, what is the probability that you picked 6 random numbers? Put this way, the question sounds silly: the probability is 1, of course. You just did it, it's a past event - it either happened or it didn't, so the probability is either 1 or 0. And this is what the above "experiment" is all about: picking 53 random numbers, and then asking "how probable is that you in fact picked 53 random numbers?", can only have one answer: the probability is 1.

How do we distinguish between the two cases? We need what is called a specification. The second case had one such specification: the winning combination. In other words, we must be able to ask the question: what is the probability that the sequence we just picked (call it x) is equal to T? We can do that in the second case; but in the first, we have nothing to replace T with - except for x, in which case the question becomes "what is the probability that x = x?". Once we have a specification, we can calculate the probability. Once we have that, we can determine whether the event was random or not: anything below a specific value (we'll use 10^-150) cannot be random. (More on this value later.)

What conditions must the specification meet? Well, only one: it must be independent. To borrow an example from Dembski, an archer that shoots at a wall, then paints a bullseye around the arrow is not a very good archer. (The probability of hitting the target - the probability of x being equal to T, to use the symbols above - is 1, since T is defined by x. The two are anything but independent.) On the other hand, if the bullseye is already painted, and the archer hits it 20 times right in the center, then we can be quite sure this was no random event. (I am, of course, ignoring various way to "rig" this, like a bullseye 20 meters across, with the archer standing two feet before it, and other variations.)

Should the specification occur before the event? No, this is not necessary. A process generating 3141592 will hint that it's not random. A process generating the first 100 digits of pi will absolutely tip us off. An independent specification is enough to calculate probabilities, whether it's made known before or after the fact.

Can we make mistakes? In other words, is it possible to consider that an event is random when it isn't, or the other way around?

1. We might not realize that there is a specification. A process generating 2030481... might seem random, until we realize that each digit is the corresponding digit of pi, minus one. Or, some message might be encrypted with a good algorithm, and the resulting string would appear to be random to all statistics tests (like diehard). False negatives are a possibility.

2. On the other hand, I don't see any way for false positives. If the "cutoff value" is low enough - like the 10^-150 I'm using (which I borrowed from Dembski) - there is no way it can happen randomly (that is, without an intelligence involved). [I am not a subscriber to the QM "anything possible is necessary" many-worlds interpretation.] Of course, with a cutoff value too large - like 1 in 10 - false positives can happen. This is why Dembski picked 10^-150, to avoid any possibility of a false positive.

Finally, why 10^-150?

It is estimated that there are about 10^80 particles in the known universe.
The Planck time is the smallest meaningful unit of time: 10^-45 seconds. This means that nothing physical can change state more than 10^45 times per second.
Finally, the universe is estimated to be more than a billion times younger than 10^25 seconds.

Multiplying these figures, 10^80 x 10^45 x 10^25 gives 10^150. This means that, if all the particles in the known universe, for a billion times longer than they (allegedly [1]) existed, "tried" to generate new combinations, they wouldn't have exhausted 10^150 such combinations. That is, there are simply not enough resources in this universe to account for an event with a probability smaller than 10^-150. Which means that, if we encounter an event whose probability (reminder: we need an independent specification to be able to calculate this probability) is smaller than 10^-150, an intelligence must have been involved.

In other words, if P(E) < 10^-150, E was caused by an intelligence.


[1] I'm an young-earth creationist. I believe the universe to be less than 10,000 years old.