Wednesday, August 20, 2003

Human DNA

Is the probability of obtaining a DNA sequence which codes for a human (ie, a living being capable of interbreeding with humans) by any combination of random processes and deterministic functions (like natural selection) less than 10^-150?

Let's assume we already have the humans' alleged ancestor race, call it apes. Is it possible for random mutations to change an ape DNA into a human DNA?

Let S = {A, C, G, T} the possible values for a codon, and S* a sequence s[1] ... s[n] where s[i] is in S. We define

dist* (a, b) with a, b in S* = the number of point mutations needed to change a into b (or viceversa):

dist* (a[1] ... a[n], b[1] ... b[m]) = sum (i = 1, n, dist (a[i], b[i])) + m - n, with m >= n >= 1

dist (a, b) = {0 if a = b, 1 otherwise}, with a, b in S

In other words, if everything works out perfectly, it takes at least dist* (A, B) point mutations to convert one into the other, where A is a member of the ape DNA set, and B is a member of the human DNA set.

How close are these sets? It is alleged that they are very close - that the resemblance is between 95% and 99%. Out of 3 billion codons, this would give a difference of 30 to 150 million codons. If we go further and claim that only 1 in 10,000 of the DNA is actually relevant - the rest being "junk DNA" - we are left with a relevant difference of only 3 to 15 thousand "letters".

This means that we need between 3 and 15 thousand point mutations to get from apes to humans. How probable is that, using purely random mechanisms? Well, the problem is equivalent to getting from AAA...A to CCC...C, both strings having 3,000 letters, changing one letter at each step into one of the values A, C, G, or T - at random. At each step, the probability of getting the right one is 1/4; since we need 3,000 lucky changes, the probability of getting them through purely random means is 1/4^3,000 = 2^-6,000. This is WAY below 2^-500.

What about the Weasel program? (You can find an example here.) Couldn't such a program get much faster from AAA...A to CCC...C? Yes, of course it could. But the problem with the weasel program - or any similar algorithm - is that it already knows its target. In other words, the weasel program is equivalent to the following one:

print "CCC...C"

Unless someone can convincingly argue that evolution's goal was to transform apes into humans, any algorithm with a predefined target (which can be reduced to the above) is out. This includes more "roundabout" algorithms that try to conceal the fact that they know their target, like:

for i = 1 to 3000
print chr(-i + ord("A") + (2 * i + 6) / 2 - 1)

What about using an evaluation function (for the natural selection part)? Those can serve as probability amplifiers (they lose information, instead of adding new one, but that's ok in this case - we already have a source of information, randomness; we now need something to "guide" that information).

Well, we have two relevant classes of such functions:
1) functions that know the target - and we're back to the above case - and
2) functions that calculate a "fitness", and it so happens that the fitness of the CCC...C string is greatest, while the fitness of the AAA...A string is lowest.

Option 2 is a viable one, provided that someone produces such a function for the real-life case of human DNA. (I know it's easy to do it for our strings.) Can this be done, without using any post-facto rationalization (like "humans are smarter than apes, therefore the function must give them more points")? [Remember, this must work for any pair of ancestor - evolved species. Such a function won't help with bacteria. The No Free Lunch theorems would suggest that there is no such function that works for all cases.]

So, the final conclusion would be: if evolution's goal is not to produce humans, and if there's no general algorithm that nevertheless happens to favor "closer-to-human DNA" over "closer-to-ape DNA", human DNA can't have occurred by chance - even with the help of natural selection.

No comments: