Abstract
Protein segments that contain few of the possible twenty amino acids, sometimes in tandem repeat arrays, are referred to as containing “simple” or “low complexity” sequence. Many proteins of the malaria parasite, P. falciparum, are longer than their homologs in other species by virtue of their content of such low complexity segments that have no known function; these are interspersed among segments of higher complexity to which function can often be ascribed. If there is low complexity at the protein level then there is low complexity at the corresponding nucleic acid level (often seen as a departure from equifrequency of the four bases). Thus, low complexity may have been selected primarily at the nucleic acid level and low complexity at the protein level may be secondary. The amino acids in low complexity segments may be mere placeholders. The amino acid composition of low complexity segments should then be more reflective than that of high complexity segments on forces operating at the nucleic acid level – such as GC-pressure, AG-pressure, AC-pressure, and fold pressure. Consistent with this, for amino acid-determining first and second codon positions, low complexity segments show significant contributions to downward GC-pressure (decreased percentage of G+C) and to upward AG-pressure (increased percentage of A+G). When not countermanded by high contributions to AG-pressure, which locally decrease fold potential, low complexity segments can also contribute to fold potential. Thus they can influence recombination within a gene. Short tandem repeat sequences under AC-pressure violate PR2 and are extruded asymmetrically as stem-loops from DNA duplexes. This may favor specialized forms of somatic recombination, but probably does not affect meiotic pairing of chromosomes. These observations have implications for our understanding of malaria, infectious mononucleosis, and brain diseases in which protein aggregates accumulate.
All perception and all response, all behaviour and all classes of behaviour, all learning and all genetics, all neurophysiology and all endocrinology, all organization and all evolution – one entire subject matter – must be regarded as communicational in nature, and therefore subject to the great generalizations or ‘laws’ which apply to communicational phenomena. We therefore are warned to expect to find in our data those principles of order which fundamental communication theory would propose.
Gregory Bateson (1964) [1]
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bateson G (1964) The logical categories of learning and communication. In: Steps to an Ecology of Mind. Paladin, St. Albans (1973) pp 250–279
Sibbald PR (1989) Calculating higher order DNA sequence information measures. Journal of Theoretical Biology 136:475–483
Wan H, Wootton JC (2000) A global complexity measure for biological sequences. AT-rich and GC-rich genomes encode less complex proteins. Computers & Chemistry 24:71–94
Cristillo AD, Mortimer JR, Barrette IH, Lillicrap TP, Forsdyke DR (2001) Double-stranded RNA as a not-self alarm signal: to evade, most viruses purine-load their RNAs, but some (HTLV-1, EBV) pyrimidine-load. Journal of Theoretical Biology 208:475–491
Forsdyke DR (2002) Selective pressures that decrease synonymous mutations in Plasmodium falciparum. Trends in Parasitology 18:411–418
Xue HY, Forsdyke DR (2003) Low complexity segments in Plasmodium falciparum proteins are primarily nucleic acid level adaptations. Molecular & Biochemical Parasitology 128:21–32
Pizzi E, Frontali C (2001) Low-complexity regions in Plasmodium falciparum proteins. Genome Research 11:218–229
Forsdyke DR (1996) Stem-loop potential: a new way of evaluating positive Darwinian selection? Immunogenetics 43:182–189
Figueroa AA, Delaney S (2010) Mechanistic studies of hairpin to duplex conversion for trinucleotide repeat sequences. Journal of Biological Chemistry 285:14648–14657
Suhr ST, Senut M-C, Whitelegge JP, Faull KF, Cuizon DB. Gage FH. (2001) Identities of sequestered proteins in aggregates from cells with induced polyglutamine expression. Journal of Cell Biology 153:283–294
Tian B, et al. (2000) Expanded CUG repeat RNAs form hairpins that activate the double-stranded RNA-dependent protein kinase PKR. RNA 6:79–87
Peel AL, Rao RV, Cottrell BA, Hayden MR, Ellerby LM, Bredesen DE (2001) Double-stranded RNA-dependent protein kinase, PKR, binds preferentially to Huntington’s disease (HD) transcripts and is activated in HD tissue. Human Molecular Genetics 10:1531–1538
O’Rourke JR, Swanson MS (2009) Mechanisms of RNA-mediated disease. Journal of Biological Chemistry 284:7419–7423
Flamm WG, Walker PM, McCallum M (1969) Some properties of the single strands isolate from the DNA of the nuclear satellite of the mouse (Mus musculus). Journal of Molecular Biology 40:423–443
Zhang C, Xu S, Wei J-F, Forsdyke DR (2008) Microsatellites that violate Chargaff's second parity rule have base order-dependent asymmetries in the folding energies of complementary DNA strands and may not drive speciation. Journal of Theoretical Biology 254:168–177
Orgel LE, Crick FH (1980) Selfish DNA: the ultimate parasite. Nature 284:604–607
Robertson M (1981) Gene families, hopeful monsters and the selfish genetics of DNA. Nature 293:333–334
Flavell RB (1982) Sequence amplification, deletion and rearrangement: major sources of variation during species divergence. In: Dover GA, Flavell RB (eds) Genome Evolution. Academic Press, San Diego, pp 301–323
Jeffreys AJ (1985) Individual-specific ‘fingerprints’ of DNA. Nature 316:76–79
Biémont C (2008) Within species variation in genome size. Heredity 101:297–298
Ellegren H (2004) Microsatellites: simple sequences with complex evolution. Nature Reviews Genetics 5:435–445
Majewski J, Ott J (2000) GT repeats are associated with recombination on human chromosome 22. Genome Research 10:1108–1114
Lao PJ, Forsdyke DR (2000) Crossover hot-spot instigator (CHI) sequences in Escherichia coli occupy distinct recombination/transcription islands. Gene 243:47–57
Huang F-T, Yu K, Balter BB, Selsing E, Oruc Z, Khamlichi AA, Hsieh C-L, Lieber MR (2007) Sequence dependence of chromosomal R-loops at the immunoglobulin heavy-chain Smu class switch region. Molecular & Cellular Biology 27:5921–5932
Gvozdev VA, Kogan GL, Usakin KA (2005) The Y chromosome as a target of acquired and amplified genetic material in evolution. BioEssays 27:1256–1262
Talbert PB, Henikoff S (2010) Centromeres convert but don’t cross. PLOS Biology 8:e1000326
Forsdyke DR, Zhang C, Wei J-F (2010) Chromosomes as interdependent accounting units. The assigned orientation of C. elegans chromosomes minimizes the total W-base Chargaff difference. Journal of Biological Systems 18:1–16
Wahls WP (1998) Meiotic recombination hotspots: shaping the genome and insights into hypervariable minisatellite DNA change. Current Topics in Developmental Biology 37:37–75
Pratto F, Brick K, Khil P, Smagulova F, Petukhova GV, Camerini-Otero RD (2014) Recombination initiation maps of individual human genomes. Science 346:826
Wahls WP, Davidson MK (2011) DNA sequence-mediated, evolutionarily rapid redistribution of meiotic recombination hotspots. Genetics 189:685–694
Trifonov EN, Sussman JL (1980) The pitch of chromatin DNA is reflected in its nucleotide sequence. Proceedings of the National Academy of Sciences USA 77:3816–3820
Trifonov EN (1998) 3-, 10.5-, 200-, and 400-base periodicities in genome sequences. Physica A 249:511–516
Schieg P, Herzel H (2004) Periodicities of 10-11 bp as indicators of the supercoiled state of genomic DNA. Journal of Molecular Biology 343:891–901
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Forsdyke, D.R. (2016). Complexity. In: Evolutionary Bioinformatics. Springer, Cham. https://doi.org/10.1007/978-3-319-28755-3_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-28755-3_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28753-9
Online ISBN: 978-3-319-28755-3
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)