Summary
The evolutionary theory, “evolution by duplication”, originally proposed by Susumu Ohno in 1970, can now be verified with the available genome sequences. Recently, several mathematical models have been proposed to explain the topology of protein interaction networks that have also implemented the idea of “evolution by duplication”. The power law distribution with its “hubby” topology (e.g., P53 was shown to interact with an unusually large number of other proteins) can be explained if one makes the following assumption: new proteins, which are duplicates of older proteins, have a propensity to interact only with the same proteins as their evolutionary predecessors. Since protein interaction networks, as well as other higher-level cellular processes, are encoded in genomic sequences, the evolutionary structure, topology, and statistics of many biological objects (pathways, phylogeny, symbiotic relations, etc.) are rooted in the evolution dynamics of the genome sequences. Susumu Ohno’s hypothesis can be tested “in silico” using Polya’s urn model. In our model, each basic DNA sequence change is modelled using several probability distribution functions. The functions can decide the insertion/deletion positions of the DNA fragments, the copy numbers of the inserted fragments, and the sequences of the inserted/deleted pieces. Moreover, those functions can be interdependent. A mathematically tractable model can be created with a directed graph representation. Such graphs are Eulerian and each possible Eulerian path encodes a genome. Every “genome duplication” event evolves these Eulerian graphs, and the probability distributions and their dynamics themselves give rise to many intriguing and elegant mathematical problems. In this chapter, we explore and survey these connections between biology, mathematics and computer science in order to reveal simple, and yet deep, models of life itself.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Peng, C.K. et al: Long-range correlations in nucleotide sequences. Nature 356, 168–170 (1992)
Gomez, S.M., Rzhetsky, A.: Birth of scale-free molecular networks and the number of distinct DNA and protein domains per genome. Bioinformatics 17, 988–996 (2001)
Fields, S., Schwikowski, B., Uetz, P.: A network of protein-protein interactions in yeast. Nature Biotechnology 18, 1257–1261 (2000)
Albert, R., Barabasi, A.-L.: Statistical mechanics of complex networks. Reviews of Modern Physics 74, 48–97 (2002)
Havlin, S. et al: Mosaic organization of DNA nucleotides. Physical Review E 49, 1685–1689 (1994)
Ehrlich, S.D., Viguera, E., Canceill, D.: Replication slippage involves DNA polymerase pausing and dissociation. EMBO Journal 20, 2587–2596 (2001)
Lilley, D.M.J., Eckstein, F.: DNA Repair (Springer, Berlin Heidelberg New York 1998)
Albert, R. et al: The large-scale organization of metabolic networks. Nature 407, 651–654 (2000)
Barabasi, A.L. et al: Lethality and centrality in protein networks. Nature 411, 41–42 (2001)
Gerstein, M., Qian, J., Luscombe, N.M.: Protein family and fold occurrence in genomes: power-law behavior and evolutionary model. Journal of Molecular Biology 313, 673–681 (2001)
Rain, J.C. et al: The protein-protein interaction map of Helicobacter pylori. Nature 409, 211–215 (2001)
Vogelstein, B., Lane, D., Levine, A.J.: Surfing the P53 network. Nature 408, 307–310 (2000)
Johnson, N.L.: Urn models and their application (Wiley 1977)
Ganapathiraju, M. et al: Comparative n-gram analysis of whole-genome protein sequences. In: HLT’02: Human Language Technologies Conference, San Diego, California, USA, 2002.
Ohno, S.: Evolution by Gene Duplication (Springer, Berlin Heidelberg New York 1970)
Apweiler, R. et al: The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Research 29, 37–40 (2000)
Sole, R.V., Pastor-Satorra, R., Smight, E.: Evolving protein interaction networks through gene duplication. Santa Fe Institute Working Paper 02-02-008 (2002)
Mantegna, R.N. et al: Linguistic features of noncoding DNA sequences. Physical Review Letters 73, 3169–3172 (1994)
Sneppen, K., Maslov, S.: Specificity and stability in topology of protein networks. Science 296, 910–913 (2002)
Buldyrev, S.V. et al: Fractal landscapes and molecular evolution: modeling the myosin heavy chain gene family. Biophysical Journal 65, 2673–2679 (1993)
Eichler, E.E.: Recent duplication, domain accretion and the dynamic mutation of the Human genome. Trends in Genetics 17, 661–669 (2001)
Bailey, J.A. et al: Recent segmental duplications in the human genome. Science 297, 1003–1007 (2002)
Graur, D., Li, W-H.: Fundamentals of Molecular Evolution (Sinauer 2000)
Gu, X., Li, W.-H.: The size distribution of insertions and deletions in human and rodent pseudogenes suggests the logarithmic gap penalty for sequence alignment. Journal of Molecular Evolution 40, 464–473 (1995)
Ophir, R., Graur, D.: Patterns and rates of indel evolution in processed pseudogenes from humans and murids. Gene 205, 191–202 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Zhou, Y., Mishra, B. (2004). Models of Genome Evolution. In: Ciobanu, G., Rozenberg, G. (eds) Modelling in Molecular Biology. Natural Computing Series. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-18734-6_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-18734-6_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-62269-4
Online ISBN: 978-3-642-18734-6
eBook Packages: Springer Book Archive