Information Theory and Creationism: William Dembski

[Shannon information] [Algorithmic information] [Gitt information] [Spetner information]

Other Links:

A Mathematical Theory of Communication: The 1948 paper that founded Information Theory, by mathematician Claude E. Shannon of Bell Labs.
G J Chaitin Home Page: The home page of one of Algorithmic Information Theory's inventors, Gregory Chaitin. Many of his papers are available here.
Kolmogorov Complexity and Solomonoff Induction: A web site with numerous references and links (some expired) on the ideas of Kolmogorov and Solomonoff.
The No Free Lunch FAQ: Richard Wein's critique of Dembski's book, No Free Lunch.
The AntiEvolutionists: William A. Dembski: Information and critical commentary about Dembski's ideas, by Wesley Elsberry.
Information Theory, Evolutionary Computation, and Dembski's “Complex Specifed Information": Critical review by Wesley Elsberry and Jeffrey Shallit.
Intelligent Design as a Theory of Information: An article by mathemetician and philosopher William Dembski, author of several books on Intelligent Design.

Introduction

he information ideas of Dembski, a highly verbose author, are highlighted in Intelligent Design as a Theory of Information ^[4]. The purpose of the paper is to "(1) show how information can be reliably detected and measured, and (2) formulate a conservation law that governs the origin and flow of information." One immediately discerns that Dembski is talking about something quite different from Shannon, Chaitin, and Kolmogorov when he states in the second paragraph. "neither algorithms nor natural laws are capable of producing information."

Information Defined [Top]

Dembski defines information as follows:

For there to be information, there must be a multiplicity of distinct possibilities any one of which might happen. When one of these possibilities does happen and the others are ruled out, information becomes actualized. Indeed, information in its most general sense can be defined as the actualization of one possibility to the exclusion of others (observe that this definition encompasses both syntactic and semantic information).

Dembski then goes on to propose -log ₂ p as the measure of information for an event with probability p, precisely the same usage as Shannon. Dembski's definition, however, clearly contrasts with Shannon's consideration of the average information from an information source resulting from the statistical ensemble of its possible output sequences.

Complexity [Top]

Next, Dembski describes complex information and claims that the measure -log₂p is a complexity measure.

Information is a complexity-theoretic notion. Indeed, as a purely formal object, the information measure described here is a complexity measure (cf. Dembski, 1998, ch. 4). Complexity measures arise whenever we assign numbers to degrees of complication. A set of possibilities will often admit varying degrees of complication, ranging from extremely simple to extremely complicated. Complexity measures assign non-negative numbers to these possibilities so that 0 corresponds to the most simple and _ [sic] to the most complicated. For instance, computational complexity is always measured in terms of either time (i.e., number of computational steps) or space (i.e., size of memory, usually measured in bits or bytes) or some combination of the two. The more difficult a computational problem, the more time and space are required to run the algorithm that solves the problem. For information measures, degree of complication is measured in bits. Given an event A of probability P(A), I(A) = -log₂P(A) measures the number of bits associated with the probability P(A). We therefore speak of the "complexity of information" and say that the complexity of information increases as I(A) increases (or, correspondingly, as P(A) decreases). We also speak of "simple" and "complex" information according to whether I(A) signifies few or many bits of information. This notion of complexity is important to biology since not just the origin of information stands in question, but the origin of complex information.

Note that, for some reason, Dembski mentions computational complexity here, but does not specifically link it to an information measure, which is well, because the two concepts are not the same – but the implication is questionable.

Complex Specified Information [Top]

Dembski goes on to distinguish between specified and unspecified complex information:

Now the information that tends to interest us as rational inquirers generally, and scientists in particular, is not the actualization of arbitrary possibilities which correspond to no patterns, but rather the actualization of circumscribed possibilities which do correspond to patterns. There's more. Patterned information, though a step in the right direction, still doesn't quite get us specified information. The problem is that patterns can be concocted after the fact so that instead of helping elucidate information, the patterns are merely read off already actualized information...

Specified information is always patterned information, but patterned information is not always specified information. For specified information not just any pattern will do. We therefore distinguish between the "good" patterns and the "bad" patterns. The "good" patterns will henceforth be called specifications. Specifications are the independently given patterns that are not simply read off information. By contrast, the "bad" patterns will be called fabrications. Fabrications are the post hoc patterns that are simply read off already existing information...

The distinction between specified and unspecified information may now be defined as follows: the actualization of a possibility (i.e., information) is specified if independently of the possibility's actualization, the possibility is identifiable by means of a pattern. If not, then the information is unspecified. Note that this definition implies an asymmetry between specified and unspecified information: specified information cannot become unspecified information, though unspecified information may become specified information. Unspecified information need not remain unspecified, but can become specified as our background knowledge increases. For instance, a cryptographic transmission whose cryptosystem we have yet to break will constitute unspecified information. Yet as soon as we break the cryptosystem, the cryptographic transmission becomes specified information.

For the mathematical treatment, the reader is referred to Dembski's 1998 book "The Design Inference."

Dembski proposes two subsidiary conditions to the independence condition between patterns and information:

(1) a condition to stochastic conditional independence between the information in question and certain relevant background knowledge; and

(2) a tractability condition whereby the pattern in question can be constructed from the aforementioned background knowledge.

Both of which he notes are not easily formalized. He does, however, state that it is easy to determine in practice whether a pattern is given independent of a possibility, "if the pattern is given prior to the possibility being actualized." Life, he states, is a case in which a pattern is given after a possibility is actualized, yet he claims it also represents complex specified information or CSI.

Dembski attributes a number of properties to CSI:

It is CSI that for Manfred Eigen constitutes the great mystery of biology, and one he hopes eventually to unravel in terms of algorithms and natural laws. It is CSI that for cosmologists underlies the fine-tuning of the universe, and which the various anthropic principles attempt to understand (cf. Barrow and Tipler, 1986). It is CSI that David Bohm's quantum potentials are extracting when they scour the microworld for what Bohm calls "active information" (cf. Bohm, 1993, pp. 35-38). It is CSI that enables Maxwell's demon to outsmart a thermodynamic system tending towards thermal equilibrium (cf. Landauer, 1991, p. 26). It is CSI on which David Chalmers hopes to base a comprehensive theory of human consciousness (cf. Chalmers, 1996, ch. 8). It is CSI that within the Kolmogorov-Chaitin theory of algorithmic information takes the form of highly compressible, non-random strings of digits (cf. Kolmogorov, 1965; Chaitin, 1966).

Dembski goes on to argue that intelligent causation or design is necessary for CSI to arise. He argues that design can be detected:

The actualization of one among several competing possibilities, the exclusion of the rest, and the specification of the possibility that was actualized encapsulates how we recognize intelligent causes, or equivalently, how we detect design.

Intelligent Design [Top]

Dembski claims that CSI is an indicator of intelligent causation or intelligent design. He argues that CSI cannot result from a combination of chance and necessity, and that it indicates an improbable event was therefore intelligently chosen:

If chance and necessity left to themselves cannot generate CSI, is it possible that chance and necessity working together might generate CSI? The answer is No. Whenever chance and necessity work together, the respective contributions of chance and necessity can be arranged sequentially. But by arranging the respective contributions of chance and necessity sequentially, it becomes clear that at no point in the sequence is CSI generated. Consider the case of trial-and-error (trial corresponds to necessity and error to chance). Once considered a crude method of problem solving, trial-and-error has so risen in the estimation of scientists that it is now regarded as the ultimate source of wisdom and creativity in nature. The probabilistic algorithms of computer science (e.g., genetic algorithms-see Forrest, 1993) all depend on trial-and-error. So too, the Darwinian mechanism of mutation and natural selection is a trial-and-error combination in which mutation supplies the error and selection the trial. An error is committed after which a trial is made. But at no point is CSI generated.

Dembski summarizes CSI as an indicator of design:

This argument for showing that CSI is a reliable indicator of design may now be summarized as follows: CSI is a reliable indicator of design because its recognition coincides with how we recognize intelligent causation generally. In general, to recognize intelligent causation we must establish that one from a range of competing possibilities was actualized, determine which possibilities were excluded, and then specify the possibility that was actualized. What's more, the competing possibilities that were excluded must be live possibilities, sufficiently numerous so that specifying the possibility that was actualized cannot be attributed to chance. In terms of probability, this means that the possibility that was specified is highly improbable. In terms of complexity, this means that the possibility that was specified is highly complex. All the elements in the general scheme for recognizing intelligent causation (i.e., Actualization-Exclusion-Specification) find their counterpart in complex specified information-CSI. CSI pinpoints what we need to be looking for when we detect design.

Conservation of Information [Top]

Finally, Dembski proposes a Law of Conservation of Information:

This strong proscriptive claim, that natural causes can only transmit CSI but never originate it, I call the Law of Conservation of Information

which he holds has the following corrolories:

(1) The CSI in a closed system of natural causes remains constant or decreases.

(2) CSI cannot be generated spontaneously, originate endogenously, or organize itself (as these terms are used in origins-of-life research).

(3) The CSI in a closed system of natural causes either has been in the system eternally or was at some point added exogenously (implying that the system though now closed was not always closed). >

(4) In particular, any closed system of natural causes that is also of finite duration received whatever CSI it contains before it became a closed system.

Where Dembski Goes Wrong [Top]

In attempting to define complex information, Dembski conflates a definition of information from Classical Information Theory (probability) with a modified definition from Algorithmic Information Theory (computational length, or Kolmogorov complexity). Recall that Dembski defined information as -log₂ p, where p represents the probability of an event. This is essentially Shannon's usage in Classical Information Theory. On the other hand, he goes on to state that -log ₂ p is a complexity measure:

Information is a complexity-theoretic notion. Indeed, as a purely formal object, the information measure described here is a complexity measure... Given an event A of probability P(A), I(A) = -log2P(A) measures the number of bits associated with the probability P(A). We therefore speak of the "complexity of information" and say that the complexity of information increases as I(A) increases (or, correspondingly, as P(A) decreases).

So far, Dembski has applied Kolmogorov complexity to the Shannon information resulting from a single event. At least, other parts of the article imply he means Kolmogorov complexity, and he doesn’t state otherwise in this paragraph. Mathematically, there is nothing wrong with this, though the usefulness isn’t very clear at all. There is a big error in the final half of the last sentence, however:

…the complexity of information increases as I(A) increases (or, correspondingly, as P(A) decreases)

This is wrong. In general, there is no relationship whatsoever between the Kolmogorov complexity of a string and its probability of occurrence. Kolmogorov complexity of a string is the length of the shortest program on a reference Universal Turing Machine or UTM (a sort of generalized computer) that will produce that string. It depends on two things: (1) the contents of the string, and (2) the reference computer, neither of which relate to the probability of the string’s occurrence. There is an infinite number of UTMs to choose from. Given an arbitrary finite string, we can find a UTM on which the Kolmogorov complexity of the string is arbitrarily low or arbitrarily high. Nature has no preference for one UTM over another.

Furthermore, if the contents of a string depended on its probability of occurrence then Classical Information Theory, which has made possible all manner of modern telecommunications, wouldn’t be necessary. One of the basic concepts in Shannon’s theory is that use of an information channel is maximized by recoding the messages so that more probable messages are shorter and less probable messages are longer. Fano-Shannon codes are an example of this. If the number of bits in a string were inherently related to probability, we wouldn’t need to recode them and Fano-Shannon codes would be useless.

Dembski makes a fantastic leap in assuming that an information metric derived from the probability of a single event (-log ₂ p) and the shortness of the minimum algorithm needed to represent the event (Chaitin-Kolmogorov) are necessarily related. Unsurprisingly, the mathematical rigor supporting Dembski's case is lacking. In fact, there is no reason to conclude that any relationship between event probability and Kolmogorov complexity exists for an arbitrary information source.

As a side note, Kolmogorov complexity has the disadvantage of being uncomputable and hence makes a poor metric.

Of course, Dembski is rather vague about whether he really means Kolmogorov complexity here. It is possible that he is simply calling the Shannon information metric a complexity measure, in which case we would have to wonder why he is changing the terminology without warning. Even if this is what Dembski means, the problem about assuming a string's probability is inherently related to its contents remains.

Another big error is found in this passage:

It is CSI that within the Kolmogorov-Chaitin theory of algorithmic information takes the form of highly compressible, non-random strings of digits

Dembski has it backwards. In the Algorithmic Information Theory of Chaitin and Kolmogorov, highly-compressible strings have low complexity/information content, not high complexity/information content; and complexity increases with greater algorithmic randomness. For example, on a general-purpose computer, we expect the program “print 1 million zeros” will take much fewer than 1 million bits to describe. Here is a highly compressible string with very little complexity and very little information content. On the other hand, it is generally very difficult to compress random noise. As a mathematician, Dembski ought to know this. We could, of course, find a grossly inefficient computer that is nonetheless Universal, on which the program “print 1 million zeros” requires many more than 1 million bits to code, which is another reason to avoid Kolmogorov complexity when analyzing strings).

The roots of another big mistake are here:

Specified information is always patterned information, but patterned information is not always specified information. For specified information not just any pattern will do. We therefore distinguish between the "good" patterns and the "bad" patterns. The "good" patterns will henceforth be called specifications. Specifications are the independently given patterns that are not simply read off information. By contrast, the "bad" patterns will be called fabrications. Fabrications are the post hoc patterns that are simply read off already existing information...

The implication is that, given two patterns, we can somehow know which pattern caused the other. He continues:

The distinction between specified and unspecified information may now be defined as follows: the actualization of a possibility (i.e., information) is specified if independently of the possibility's actualization, the possibility is identifiable by means of a pattern. If not, then the information is unspecified.

Here we have another great leap in Dembski's assumption that, if a pattern exists prior to a possibility being actualized, it must be causal. Anyone trained in statistics must know the danger in making assumptions of causality simply because a correlation between two variables is noted. Correlations do not imply causality between the correlated variables. For example, the correlation between malaria and swamps was observed long ago. The disease malaria was incorrectly attributed to the bad (mal) air (aria) near swamps. The correlation was correctly noted, but the assumption of causality was flawed. While a correlated pattern and actualized possibility may have a related cause, one cannot assume the pattern caused the actualized possibilty. Accordingly, this proposal for detecting design is highly suspect. This is a critical error.

To make matters worse, Dembski relies on knowing the probability of a single event occurring. It is not in general possible to know the probability associated with a single event. One must have statistical knowledge of the process from which the event arose to know its probability with certainty; or a sufficiently large number of samples to estimate it. Pretending the probability somehow depends on the event's Kolmogorov complexity doesn't help.

His argument that CSI cannot be created by a combination of chance and necessity (or mutation and natural selection) is an argument from ignorance. Dembski asserts that it cannot be done, but fails to demonstrate why. He also implies CSI will provide information about excluded possibilities, without showing how.

The farther Dembski goes, the more he resorts to arm-waving. He provides no evidence to support his so-called Law of Conservation of Information, which he admits is a "strong proscriptive claim." Since it is a claim and not a law, any arguments based on it can and should be rejected as pseudo-science.

It is interesting to contrast the stated purpose of the article:

(1) show how information can be reliably detected and measured, and (2) formulate a conservation law that governs the origin and flow of information

with his statements further on in the article as he gets to the heart of his argument:

This is a vast topic whose full elucidation is beyond the scope of this paper (the details can be found in my monograph The Design Inference).

and

The aim of this last section is briefly to sketch the Law of Conservation of Information (a full treatment will be given in Uncommon Descent, a book I am jointly authoring with Stephen Meyer and Paul Nelson).

In other words, having loaded up a long article with glib arm-waving lacking in details, after first claiming that a profound new principle will be formulated in the article - Dembski directs the reader to his next book. He seems to have changed direction, however. He has edited a similarly titled book Uncommon Dissent: Intellectuals who find Darwinism Unconvincing, but it only contains a forward by Dembski. His “Law of Conservation of Information” is expounded in No Free Lunch. For a critical review of this book, see The No Free Lunch FAQ.

[Top]

Main

[Shannon information] [Algorithmic information] [Gitt information] [Spetner information]

Home Page | Browse | Search | Feedback | Links