"We have obtained estimates of genetic differentiation between humans and the great apes no greater than, say, those observed between physically indistinguishable sibling species of fruit flies."
Elizabeth J. Bruce and Francisco J. Ayala
"Humans And Apes Are Genetically Very Similar,"
Nature 276:264, Nov. 16, 1978
he molecular sequence evidence gives the most impressive and irrefutable evidence for the genealogical relatedness of all life. The nature of molecular sequences allows for extremely impressive probability calculations that demonstrate how well the predictions of common descent with modification actually match empirical observation. Common descent is a deduction that directly follows from premises based on empirically observed molecular evidence. In addition, knowledge of biological molecular mechanisms and structures, combined with macroevolutionary theory, has given very specific, novel, and testable biomolecular predictions.
The support for common descent given by studies of molecular sequences can be phrased as a deductive argument. This argument is unique within this FAQ, as it is the only instance we can directly conclude that similarity implies relatedness. This conclusion depends upon the similarity of biological structures within a specific context: the similarity observed between ubiquitous genes from different species.
The following discussion is somewhat technical, so it is first presented in the outline of a deductive argument, which makes the logical thread easy to follow. Here are listed the premises of the argument followed by the conclusion and further discussion.
(P1) Ubiquitous genes: There are certain genes that all living organisms have because they perform very basic life functions; these genes are called ubiquitous genes.
(P2) Ubiquitous genes are uncorrelated with species-specific phenotypes: Ubiquitous genes have no relationship with the specific functions of different species. For example, it doesn't matter whether you are a bacterium, a human, a frog, a whale, a hummingbird, a slug, a fungus, or a sea anemone - you have these ubiquitous genes, and they all perform the same basic biological function no matter what you are.
(P3) Molecular sequences of ubiquitous genes are functionally redundant: Any given ubiquitous protein has an extremely large number of different functionally equivalent forms (i.e. protein sequences which can perform the same biochemical function).
(P4) Specific ubiquitous genes are unnecessary in any given species: Obviously, there is no a priori reason why every organism should have the same sequence or even similar sequences. No specific sequence is functionally necessary in any organism - all that is necessary is one of the large number of functionally equivalent forms of a given ubiquitous gene or protein.
(P5) Heredity correlates sequences, even in the absence of functional necessity: There is one, and only one, observed mechanism which causes two different organisms to have ubiquitous proteins with similar sequences (aside from the extreme improbability of pure chance, of course). That mechanism is heredity.
(C) Thus, similar ubiquitous genes indicate genealogical relationship: It follows that organisms which have similar sequences for ubiquitous proteins are genealogically related. Roughly, the more similar the sequences, the closer the genealogical relationship.
The amino acid sequences of proteins are often used to establish the phylogenetic relationships of species. Sequence studies with functional genes have centered on genes of proteins (or RNAs) that are ubiquitous (i.e. all organisms have them). This is done to insure that the comparisons are independent of the overall species phenotype.
For example, suppose we are comparing the protein sequence of a chimpanzee and that of a human. Both of these animals have many similar anatomical characters and functions, so we might expect their proteins to be similar too, regardless of whether they are genealogically related or not. However, we can compare the sequences of very basic genes that are used by all living organisms, such as the cytochrome c gene, which have no influence over specific chimpanzee or human characteristics.
Cytochrome c is an essential and ubiquitous protein found in all organisms, including eukaryotes and bacteria (Voet and Voet 1995, p. 24). The mitochondria of cells contain cytochrome c, where it transports electrons in the fundamental metabolic process of oxidative phosphorylation. The oxygen we breathe is used to generate energy in this process (Voet and Voet 1995, pp. 577-582).
Using a ubiquitous gene such as cytochrome c, there is no reason to assume that two different organisms should have the same protein sequence or even similar protein sequences, unless the two organisms are genealogically related. This is due in part to the functional redundancy of protein sequences and structures. Here, "functional redundancy" indicates that many different protein sequences form the same general structure and perform the same general biological role. Cytochrome c is an extremely functionally redundant protein, because many dissimilar sequences all form cytochrome c electron transport proteins. Functional redundancy need not be exact in terms of performance; some functional cytochrome c sequences may be slightly better at electron transport than others.
Decades of biochemical evidence have shown that many amino acid mutations, especially of surface residues, have only small effects on protein function and on protein structure (Branden and Tooze 1999, Ch. 3; Harris et al. 1956; Lesk 2001, Chs. 5 and 6, pp. 165-228; Li 1997, p. 2; Matthews 1996). A striking example is that of the c-type cytochromes from various bacteria, which have virtually no sequence similarity. Nevertheless, they all fold into the same three-dimensional structure, and they all perform the same biological role (Moore and Pettigrew 1990, pp. 161-223; Ptitsyn 1998).
Even within species, most amino acid mutations are functionally silent. For example, there are at least 250 different amino acid mutations known in human hemoglobin, carried by more than 3% of the world's population, that have no clinical manifestation in either heterozygotic or homozygotic individuals (Bunn and Forget 1986; Voet and Voet 1995, p. 235). The phenomenon of protein functional redundancy is very general, and is observed in all known proteins and genes.
With this in mind, consider again the molecular sequences of cytochrome c. Cytochrome c is absolutely essential for life - organisms that lack it cannot live. It has been shown that the human cytochrome c protein works in yeast (a unicellular organism) that has had its own native cytochrome c gene deleted, even though yeast cytochrome c differs from human cytochrome c over 40% of the protein (Tanaka et. al 1988a; Tanaka et al. 1988b; Wallace and Tanaka 1994). In fact, the cytochrome c genes from tuna (fish), pigeon (bird), horse (mammal), Drosophila fly (insect), and rat (mammal) all function in yeast that lack their own native yeast cytochrome c (Clements et al. 1989; Hickey et al. 1991; Koshy et al. 1992; Scarpulla and Nye 1986). Furthermore, extensive genetic analysis of cytochrome c has demonstrated that the majority of the protein sequence is unnecessary for its function in vivo (Hampsey et al. 1986; Hampsey et al. 1988). Only about a third of the 100 amino acids in cytochrome c are necessary to specify its function. Most of the amino acids in cytochrome c are hypervariable (i.e. they can be replaced by a large number of functionally similar amino acids) (Dickerson and Timkovich 1975). Importantly, Hubert Yockey has done a careful study in which he calculated that there are a minimum of 2.3 x 1093 possible functional cytochrome c protein sequences, based on these genetic mutational analyses (Hampsey et al. 1986; Hampsey et al. 1988; Yockey 1992, Ch. 6, p. 254). For perspective, the number 1093 is about one billion times larger than the number of atoms in the visible universe. Thus, functional cytochrome c sequences are virtually unlimited in number, and there is no a priori reason for two different species to have the same, or even mildly similar, cytochrome c protein sequences.
In terms of a scientific statistical analysis, the "null hypothesis" is that the identity of non-essential amino acids in the cytochrome c proteins from human and chimpanzee should be random with respect to one another. However, from the theory of common descent and our standard phylogenetic tree we know that humans and chimpanzees are quite closely related. We therefore predict, in spite of the odds, that human and chimpanzee cytochrome c sequences should be much more similar than, say, human and yeast cytochrome c - simply due to inheritance.
Humans and chimpanzees have the exact same cytochrome c protein sequence. The "null hypothesis" given above is false. In the absence of common descent, the chance of this occurrence is conservatively less than 10-93 (1 out of 1093). Thus, the high degree of similarity in these proteins is a spectacular corroboration of the theory of common descent. Furthermore, human and chimpanzee cytochrome c proteins differ by ~10 amino acids from all other mammals. The chance of this occurring in the absence of a hereditary mechanism is less than 10-29. The yeast Candida krusei is one of the most distantly related eukaryotic organisms from humans. Candida has 51 amino acid differences from the human sequence. A conservative estimate of this probability is less than 10-25.
One possible, yet unlikely, objection is that the slight differences in functional performance between the various cytochromes could be responsible for this sequence similarity. This objection is unlikely because of the incredibly high number of nearly equivalent sequences that would be phenotypically indistinguishable for any required level of performance. Additionally, nearly similar sequences do not necessarily give nearly similar levels of performance.
Nonetheless, for the sake of argument, let us assume that a cytochrome c that transports electrons faster is required in organisms with active metabolisms or with high rates of muscle contraction. If this were true, we might expect to observe a pattern of sequence similarity that correlates with similarity of environment or with physiological requirement. However, this is not observed. For example, bat cytochrome c is much more similar to human cytochrome c than to hummingbird cytochrome c; porpoise cytochrome c is much more similar to human cytochrome c than to shark cytochrome c. As stated earlier in prediction 1.3, the phylogenetic tree constructed from the cytochrome c data exactly recapitulates the relationships of major taxa as determined by the completely independent morphological data (McLaughlin and Dayhoff 1973). These facts only further support the idea that cytochrome c sequences are independent of phenotypic function (other than the obvious requirement for a functional cytochrome c that transports electrons).
The point of this prediction is subtly different from prediction 1.3, "Convergence of independent phylogenies". The evidence given above demonstrates that for many ubiquitous functional proteins (such as cytochrome c), there is an enormous number of equivalent sequences which could form that protein in any given organism. Whenever we find that two organisms have the same or very similar sequences for a ubiquitous protein, we know that something fishy is going on. Why would these two organisms have such similar ubiquitous proteins when the odds are astronomically against it? We know of only one reason for why two organisms would have two similar protein sequences in the absence of functional necessity: heredity. Thus, in such cases we can confidently deduce that the two organisms are genealogically related. In this sense, sequence similarity is not only a test of the theory of common descent; common descent is also a deduction from the principle of heredity and the observation of sequence similarity. Finally, the similarity observed for cytochrome c is not confined to this single ubiquitous protein; all ubiquitous proteins that have been compared between chimpanzees and humans are highly similar, and there have been many comparisons.
Without assuming the theory common descent, the most probable result is that the cytochrome c protein sequences in all these different organisms would be very different from each other. If this were the case, a phylogenetic analysis would be impossible, and this would provide very strong evidence for a genealogically unrelated, perhaps simultaneous, origin of species (Dickerson 1972; Yockey 1992; Li 1997).
Furthermore, the very basis of this argument could be undermined easily if it could be demonstrated (1) that species specific cytochrome c proteins were functional exclusively in their respective organisms, or (2) that no other cytochrome c sequence could function in an organism other than its own native cytochrome c, or (3) that an observed mechanism besides heredity can causally correlate the sequence of a ubiquitous protein with a specific organismic morphology.
Like protein sequence similarity, the DNA sequence similarity of two ubiquitous genes also implies common ancestry. Of course, comprehensive DNA sequence comparisons of conserved proteins such as cytochrome c also indirectly take into account amino acid sequences, since the DNA sequence specifies the protein sequence. However, with DNA sequences there is an extra level of redundancy. The genetic code itself is informationally redundant; on average there are three different codons (a codon is a triplet of DNA bases) that can specify the exact same amino acid (Voet and Voet 1995, p. 966). Thus, for cytochrome c there are approximately 3104, or over 1046, different DNA sequences (and, hence, 1046 different possible genes) that can specify the exact same protein sequence.
Here we can be quite specific in our prediction. Any sequence differences between two functional cytochrome c genes are necessarily functionally neutral or nearly so. The background mutation rate in humans (and most other mammals) has been measured at ~1-5 x 10-8 base substitutions per site per generation (Mohrenweiser 1994, pp. 128-129), and an average primate generation is about 20 years. From the fossil record, we know that humans and chimpanzees diverged from a common ancestor less than 10 million years ago (a conservative estimate - most likely less than 6 million years ago) (Stewart and Disotell 1998). Thus, if chimps and humans are truly genealogically related, we predict that the difference between their respective cytochrome c gene DNA sequences should be less than 3% - probably even much less, due to the essential function of the cytochrome c gene.
As mentioned above, the cytochrome c proteins in chimps and humans are exactly identical. The clincher is that the two DNA sequences that code for cytochrome c in humans and chimps differ by only four nucleotides (a 1.2% difference), even though there are 1049 different sequences that could code for this protein.
The combined effects of DNA coding redundancy and protein sequence redundancy make DNA sequence comparisons doubly redundant; DNA sequences of ubiquitous proteins are completely uncorrelated with phenotypic differences between species, but they are strongly causally correlated with heredity. This is why DNA sequence phylogenies are considered so robust.
The most probable result is that the DNA sequences coding for these proteins should be radically different. This would be a resounding falsification of macroevolution, and it would be very strong evidence that chimpanzees and humans are not closely genealogically related. Of course, the potential falsifications for prediction 4.1 also apply to DNA sequences.
In many ways, transposons are very similar to viruses. However, they lack genes for viral coat proteins, cannot cross cellular boundaries, and thus they replicate only in the genome of their host. They can be thought of as intragenomic parasites. Except in the rarest of circumstances, the only mode of transmission from one metazoan organism to another is directly by DNA duplication and inheritance (e.g. your transposons are given to your children) (Li 1997, pp. 338-345).
Replication for a transposon means copying itself and inserting the copied DNA randomly somewhere else in the host's genome. Transposon replication (also called transposition) has been directly observed in many organisms, including yeast, corn, wallabies, humans, bacteria, and flies, and recently the mechanisms have become well understood (Li 1997, pp. 335-338; Futuyma 1998, pp. 639-641). Specific observed cases of retrotransposition are known to have caused neurofibromatosis and hemophilia in humans (Kazazian et al. 1988; Wallace et al. 1991), and cancer, among other diseases (Deininger and Batzer 1999).
This section on transposons, and the next two sections covering pseudogenes and endogenous retroviruses, are all related conceptually. The DNA sequences in intergenic regions (regions between protein-coding genes in genomes), include very many transposons (like LINEs and SINEs), endogenous retroviruses (like HERVs), pseudogenes, and other related sequences like microsatellites. Many microsatellites are closely associated with and generated by retrotransposons like LINEs and SINEs (Arcot et al. 1995; Nadir et al. 1996; Wilder and Hollocher 2001; Yandava et al. 1997). These intergenic sequences are primarily responsible for the very specific patterns seen in "DNA fingerprinting" analyses, like those performed in paternity testing or sibling testing. Like fingerprints, these intergenic regions vary considerably between individual organisms and the patterns are largely arbitrary. For instance, Alu elements, one type of SINE retrotransposon, transpose into a new genomic location about every 200 human births (Deininger and Batzer 1999), and Alus contribute to a significant fraction of human genetic diversity (Batzer and Deininger 2002). In the case of the human L1 transposon, only one of many human LINE elements, a novel retrotransposition is harbored by around 1 in 20 individuals (Scaringe et al. 2001; Ostertag and Kazazian 2001). This is a conservative estimate, given that each of us has around 50 retrotranspositional competent L1 LINEs (Brouha et al. 2003). Intergenic regions of the genome, like all DNA, is heritable and there is a very strong correlation between relatives. When two individuals are found that share specific intergenic patterns far above that expected by chance alone, it is very strong evidence of common ancestry. This is in fact the very scientific basis behind DNA fingerprinting.
As explained above, finding the same transposon in the same chromosomal location in two different organisms is strong direct evidence of common ancestry, since they insert fairly randomly and generally cannot be transmitted except by inheritance. In addition, once a common ancestor has been postulated that contains a certain transposition, all the descendants of this common ancestor should also contain the same transposition. A possible exception is if this transposition were removed due to a rare deletion event; however, deletions are never clean and usually part of the transposon sequence remains. Using the same principles behind DNA fingerprinting, biologists have used transposons, pseudogenes, and endogenous retroviruses to demonstrate that many species are genetically related, such as humans and other primates. A few of many examples are given below.
A common class of transposon is the SINE retroelement (Li 1997, pp. 349-352). One important SINE transposon is the 300 bp Alu element. All mammals contain many Alu elements, including humans where they constitute 10% of the human genome (i.e. 60 million bases of repetitive DNA) (Smit 1996; Li 1997, pp. 354, 357). Very recent human Alu transpositions have been used to elucidate historic and prehistoric human migrations, since some individuals have newer Alu insertions that other individuals lack (Novick et al. 1993; Novick et al. 1995). In fact, common Alu transpositions have been demonstrated to be reliable markers of common descent in paternity cases and in criminal forensics (Novick et al. 1993; Novick et al. 1995; Roy-Engel et al. 2001). Most importantly, in the human α-globin cluster there are seven Alu elements, and each one is shared with chimpanzees in the exact same seven locations (Sawada et al. 1985).
More specifically, three different specific SINE transpositions have been found in the same chromosomal locations of cetaceans (whales), hippos, and ruminants, all of which are closely related according to the standard phylogenetic tree. However, all other mammals, including camels and pigs, lack these three specific transpositions (Shimamura 1997).
More detail and explanation can be found on this topic in Edward Max's Plagiarized Errors and Molecular Genetics FAQ.
See the two below, as the same principles apply here.
Other molecular examples that provide evidence of common ancestry are curious DNA sequences known as pseudogenes. Pseudogenes are very closely related to functional, protein-coding genes. The similarity involves both the primary DNA sequence and often the specific chromosomal location of the genes. The functional counterparts of pseudogenes are normal genes that are transcribed into mRNA, which is in turn actively translated into functional protein. In contrast, pseudogenes have faulty regulatory sequences that prevent the gene from being transcribed into mRNA, or they have internal stop codons that keep the functional protein from being made. In this sense, pseudogenes are molecular examples of vestigial structures.
However, pseudogenes are included here under a separate prediction because many pseudogenes are unusual in an additional way. Morphological vestiges have lost their original function, and the organism carrying the vestige has likewise lost that function. In contrast, pseudogenes have lost their original function, yet the organism itself may still retain that function if it carries the functional counterpart of these pseudogenes. Pseudogenes that are vestigial in the morphological sense, like the vitamin C synthesis pseudogene, are considered in prediction 2.3. The remaining type of pseudogene, in which an organism carries both a functional gene and one or more counterpart pseudogenes, is hereafter termed a "redundant pseudogene".
Most pseudogenes are largely non-functional. There are several lines of evidence that support this conclusion. First, the presence or absence of most specific pseudogenes has no measurable effect on organismal phenotype. Second, there are good mechanistic, genetic arguments indicating pseudogenes have little, if any, function. Pseudogenes have complex sequences highly similar or identical to those required for the proper function of other enzymatic or structural proteins. These normal genes are actively transcribed and translated into proteins, whereas pseudogenes are untranslated, untranscribed, or both. Thus, pseudogenes cannot perform the functions of the proteins they encode. If pseudogenes do have a function, they must perform relatively simple functions for which the protein encoded by them was not designed.
Third, if a pseudogene has little or no function, then most mutations in the pseudogene will have only minor functional consequences, and many mutations will not be weeded out by purifying selection. Therefore, we expect that truly non-functional pseudogenes should accumulate mutations at the background rate of mutation. Pseudogenes with minor functions will accumulate mutations near the background rate. As expected if pseudogenes have little, if any, function, most pseudogenes accumulate mutations at the fastest rate known for any region of DNA in animal genomes. Furthermore, the rate of mutation inferred for pseudogenes from phylogenetic analysis matches very closely the measured rates of spontaneous mutations. For more information and references, see Prediction 5.8.
Fourth and finally, we understand how redundant pseudogenes are created, and we have observed the creation of new redundant pseudogenes in the lab and in the wild. Redundant pseudogenes originate by gene duplication and subsequent mutation. Many observed processes are known to duplicate genes, including transposition events, chromosomal duplication, and unequal crossing over of chromosomes.
These facts offer strong support for the conclusion that most pseudogenes have little, if any, function. Like transpositions (see prediction 4.3), the creation of new redundant pseudogenes by gene duplication is a rare and random event and, of course, any duplicated DNA is inherited. Thus, finding the same pseudogene in the same chromosomal location in two species is strong evidence of common ancestry.
There are very many examples of redundant pseudogenes shared between primates and humans. One is the ψη-globin gene, a hemoglobin pseudogene. It is shared among the primates only, in the exact chromosomal location, with the same mutations that destroy its function as a protein-coding gene (Goodman et al. 1989). Another example is the steroid 21-hydroxylase gene. Humans have two copies of the steroid 21-hydroxylase gene, a functional one and a untranslated pseudogene. Inactivation of the functional gene leads to congenital adrenal hyperplasia (CAH, a rare and serious genetic disease), giving positive evidence that the 21-hydroxylase pseudogene lacks its proper function. Both chimpanzees and humans share the same eight base-pair deletion in this pseudogene that renders it incapable of its normal function (Kawaguchi et al. 1992).
As explained above, observed gene duplications are rare and random events. Thus, it is highly unlikely that other mammals would have these same redundant pseudogenes in the same chromosomal locations, with the same mutations that cripple their normal functions. For instance, it is essentially impossible for mice to carry the 21-hydroxylase pseudogenes, in the same genomic location, with the same eight base-pair deletion that destroys its enzymatic function.
Furthermore, once a gene is duplicated and mutations render it a redundant pseudogene, it is inherited by all descendents. Thus, once certain organisms are found that carry the same pseudogene, common descent requires that all organisms phylogenetically intermediate must also carry that pseudogene. For example, suppose we find that humans and old world monkeys share a certain redundant pseudogene. According to common descent, all apes (including chimpanzees, gorillas, orangutans, and siamangs) must also necessarily carry that same redundant pseudogene in the same chromosomal location. This conclusion rests on the premise that there are no mechanisms for removing pseudogenes from genomes (or that the mechanisms are very inefficient). This apparently is true for vertebrates, but some organisms with short generation times, such as bacteria, protists, and Drosophila are known to have mechanisms that remove excess DNA.
Note, this confirmation and potential falsification are independent of whether a specific pseudogene has a function or whether it is completely non-functional, for the same reasons explained in the prediction on morphological vestiges. Like any other genetic element or organismic structure, evolutionary opportunism may take a pseudogene and press it into a new and different function.
Figure 4.4.1. Human endogenous retrovirus K (HERV-K) insertions in identical chromosomal locations in various primates (Reprinted from Lebedev et al. 2000, © 2000, with permission from Elsevier Science).
Endogenous retroviruses provide yet another example of molecular sequence evidence for universal common descent. Endogenous retroviruses are molecular remnants of a past parasitic viral infection. Occasionally, copies of a retrovirus genome are found in its host's genome, and these retroviral gene copies are called endogenous retroviral sequences. Retroviruses (like the AIDS virus or HTLV1, which causes a form of leukemia) make a DNA copy of their own viral genome and insert it into their host's genome. If this happens to a germ line cell (i.e. the sperm or egg cells) the retroviral DNA will be inherited by descendants of the host. Again, this process is rare and fairly random, so finding retrogenes in identical chromosomal positions of two different species indicates common ancestry.
In humans, endogenous retroviruses occupy about 1% of the genome, in total constituting ~30,000 different retroviruses embedded in each person's genomic DNA (Sverdlov 2000). There are at least seven different known instances of common retrogene insertions between chimps and humans, and this number is sure to grow as both these organism's genomes are sequenced (Bonner et al. 1982; Dangel et al. 1995; Svensson et al. 1995; Kjellman et al. 1999; Lebedev et al. 2000; Sverdlov 2000). Figure 4.4.1 shows a phylogenetic tree of several primates, including humans, from a recent study which identified numerous shared endogenous retroviruses in the genomes of these primates (Lebedev et al. 2000). The arrows designate the relative insertion times of the viral DNA into the host genome. All branches after the insertion point (to the right) carry that retroviral DNA - a reflection of the fact that once a retrovirus has inserted into the germ-line DNA of a given organism, it will be inherited by all descendents of that organism.
The Felidae (i.e. cats) provide another example. The standard phylogenetic tree has small cats diverging later than large cats. The small cats (e.g. the jungle cat, European wildcat, African wildcat, blackfooted cat, and domestic cat) share a specific retroviral gene insertion. In contrast, all other carnivores which have been tested lack this retrogene (Futuyma 1998, pp. 293-294; Todaro et al. 1975).
It would make no sense, macroevolutionarily, if certain other mammals (e.g. dogs, cows, platypi, etc.), had these same retrogenes in the exact same chromosomal locations. For instance, it would be incredibly unlikely for dogs to also carry the three HERV-K insertions that are unique to humans, as shown in the upper right of Figure 4.4.1, since none of the other primates have these retroviral sequences.
Arcot, S.S., Wang, Z., Weber, J.L., Deininger, P.L., and Batzer, M.A. (1995) "Alu repeats: a source for the genesis of primate microsatellites." Genomics. 29: 136-144. [PubMed]
Batzer, M. A., and Deininger, P. L. (2002) "Alu repeats and human genomic diversity." Nat Rev Genet. 3: 370. [PubMed]
Bonner, T. I., C. O'Connell, et al. (1982) "Cloned endogenous retroviral sequences from human DNA." PNAS 79: 4709. [PubMed]
Branden, C. and Tooze, J. (1999) Introduction to Protein Structure. Second Ed., New York, Garland Publishing. [Publisher's Site]
Brouha, B., Schustak, J., Badge, R.M., Lutz-Prigge, S., Farley, A.H., Moran, J.V., Kazazian, H.H. (2003) "Hot L1s account for the bulk of retrotransposition in the human population." Proc Natl Acad Sci U S A. 100: 5280-5285. [PNAS free full text]
Clements, J. M., O'Connell, L. I., Tsunasawa, S., and Sherman, F. (1989) "Expression and activity of a gene encoding rat cytochrome c in the yeast Saccharomyces cerevisiae." Gene 83: 1-14. [PubMed]
Dangel, A. W., B. J. Baker, et al. (1995) "Complement component C4 gene intron 9 as a phylogenetic marker for primates: long terminal repeats of the endogenous retrovirus ERV-K(C4) are a molecular clock of evolution." Immunogenetics 42: 41-52. [PubMed]
Deininger, P. L., and Batzer, M. A. (1999) "Alu repeats and human disease." Mol Genet Metab. 67: 183-193. [PubMed]
Goodman, M., B. F. Koop, et al. (1989) "Molecular phylogeny of the family of apes and humans." Genome 31(316-335). [PubMed]
Hampsey, D. M., Das, G., and Sherman F. (1986) "Amino acid replacements in yeast iso-1-cytochrome c." Journal of Biological Chemistry 261: 3259-71. [PubMed]
Hampsey, D. M., Das, G., and Sherman F. (1988) "Yeast iso-1-cytochrome c: genetic analysis of structural requirements." FEBS Letters 231: 275-83. [PubMed]
Hickey, D. R., Jayaraman, K., Goodhue, C. T., Shah,J., Fingar, S. A., Clements, J. M., Hosokawa, Y., Tsunasawa, S., and Sherman, F. (1991) "Synthesis and expression of genes encoding tuna, pigeon, and horse cytochromes c in the yeast Saccharomyces cerevisiae." Gene 105: 73-81. [PubMed]
Kawaguchi, H., C. O'hUigin, et al. (1992) "Evolutionary origin of mutations in the primate cytochrome P450c21 gene." American Journal of Human Genetics 50: 766-780. [PubMed]
Kazazian, H. H. (1999) "An estimated frequency of endogenous insertional mutations in humans." Nat Genet. 22: 130. [PubMed]
Kazazian, H. H., C. Wong, et al. (1988) "Haemophilia A resulting from de novo insertion of L1 sequences represents a novel mechanism for mutation in man." Nature 332: 164. [PubMed]
Kjellman, C., H. O. Sjogren, et al. (1999) "HERV-F, a new group of human endogenous retrovirus sequences." Journal of General Virology 80: 2383. http://vir.sgmjournals.org/cgi/content/full/80/9/2383
Koshy, T. I., Luntz, T. L., Garber, E. A., and Margoliash, E. (1992) "Expression of recombinant cytochromes c from various species in Saccharomyces cerevisiae: post-translational modifications." Protein Expr. Purif. 3: 441-52. [PubMed]
Lebedev, Y. B., Belonovitch, O. S., Zybrova, N. V, Khil, P. P., Kurdyukov, S. G., Vinogradova, T. V., Hunsmann, G., and Sverdlov, E. D. (2000) "Differences in HERV-K LTR insertions in orthologous loci of humans and great apes." Gene 247: 265-277. [PubMed]
Lesk, Arthur M. (2001) Introduction to Protein Architecture. Oxford, Oxford University Press. [Publisher's Site]
Matthews, B. W. (1996) "Structural and genetic analysis of the folding and function of T4 lysozyme." FASEB J. 10: 35-41. [PubMed]
Mohrenweiser, H. (1994) "Impact of the molecular spectrum of mutational lesions on estimates of germinal gene-mutation rates." Mutation Research 304: 119-137. [PubMed]
Nadir, E., Margalit, H., Gallily, T., and Ben-Sasson, S. A. (1996) "Microsatellite spreading in the human genome: evolutionary mechanisms and structural implications." Proc Natl Acad Sci U S A. 93: 6470-6475. [PubMed]
Novick, G. E., Gonzalez, T., Garrison, J., Novick, C. C., Batzer, M. A., Deininger, P. L., Herrera, R. J. (1993) "The use of polymorphic Alu insertions in human DNA fingerprinting." EXS. 67: 283-291. [PubMed]
Novick, G. E., Novick, C. C., Yunis, J., Yunis, E., Martinez, K., Duncan, G. G., Troup, G. M., Deininger, P. L., Stoneking, M., Batzer, M. A., et al. (1995) "Polymorphic human specific Alu insertions as markers for human identification." Electrophoresis 16: 1596-1601. [PubMed]
Ostertag, E. M., and Kazazian, H. H. (2001) "Biology of mammalian L1 retrotransposons." Annu Rev Genet. 35: 501-538. [PubMed]
Ptitsyn, O. B. (1998) "Protein folding and protein evolution: common folding nucleus in different subfamilies of c-type cytochromes." Journal of Molecular Biology 278: 655. [PubMed]
Roy-Engel, A. M., Carroll, M. L., Vogel, E., Garber, R. K., Nguyen, S. V., Salem, A. H., Batzer, M. A., and Deininger, P. L. (2001) "Alu insertion polymorphisms for the study of human genomic diversity." Genetics 159: 279-290. http://www.genetics.org/cgi/content/full/159/1/279
Sawada, I., C. Willard, et al. (1985) "Evolution of Alu family repeats since the divergence of human and chimpanzee." Journal of Molecular Evolution 22(316). [PubMed]
Li, X., Scaringe, W. A., Hill, K. A., Roberts, S., Mengos, A., Careri, D., Pinto, M. T., Kasper, C. K., and Sommer, S. S. (2001) "Frequency of recent retrotransposition events in the human factor IX gene." Hum Mutat. 17: 511-519. [PubMed]'
Scarpulla, R. C., and Nye, S. H. (1986) "Functional expression of rat cytochrome c in Saccharomyces cerevisiae." Proc Natl Acad Sci 83: 6352-6. [PubMed]
Shimamura, M., et al. (1997) "Molecular evidence from retroposons that whales form a clade within even-toed ungulates." Nature 388: 666. [PubMed]
Smit, A. F. A. (1996) "The origin of interspersed repeats in the human genome." Current Opinion in Genetics and Development 6: 743-748. [PubMed]
Stewart, C. B. and Disotell, T. R. (1998) "Primate evolution - in and out of Africa." Current Biology 8: R582-588. [PubMed]
Svensson, A. C., N. Setterblad, et al. (1995) "Primate DRB genes from the DR3 and DR8 haplotypes contain ERV9 LTR elements at identical positions." Immunogenetics 41: 74. [PubMed]
Sverdlov, E. D. (2000) "Retroviruses and primate evolution." BioEssays 22: 161-171. [PubMed]
Tanaka, Y., Ashikari, T., Shibano, Y., Amachi, T., Yoshizumi, H., and Matsubara, H. (1988a) "Amino acid replacement studies of human cytochrome c by a complementation system using CYC1 deficient yeast." J Biochem (Tokyo) Sep;104: 477-80. [PubMed]
Tanaka, Y., Ashikari, T., Shibano, Y., Amachi, T., Yoshizumi, H., and Matsubara, H. (1988b) "Construction of a human cytochrome c gene and its functional expression in Saccharomyces cerevisiae." J Biochem (Tokyo) 103: 954-61. [PubMed]
Todaro, G.J., Benveniste, R.E., Callahan, R., Lieber, M.M., and Sherr, C.J. (1975) "Endogenous primate and feline type C viruses." Cold Spring Harb Symp Quant Biol. 39 Pt 2:1159-1168.[PubMed]
Wallace, C. J., and Tanaka, Y. (1994) "Improving cytochrome c function by protein engineering?: studies of site-directed mutants of the human protein." J. Biochem. (Tokyo) 115: 693-700. [PubMed]
Wallace, M. R., L. B. Andersen, et al. (1991) "A de novo Alu insertion results in neurofibromatosis type 1." Nature 353: 864-866. [PubMed]
Wilder, J., and Hollocher, H. (2001) "Mobile elements and the genesis of microsatellites in dipterans." Mol Biol Evol. 18: 384-392. [PubMed]
Yandava, C. N., Gastier, J. M., Pulido, J. C., Brody, T., Sheffield, V., Murray, J., Buetow, K., and Duyk, G. M. (1997) "Characterization of Alu repeats that are associated with trinucleotide and tetranucleotide repeat microsatellites." Genome Res. 7: 716-724.