Skip to main content

Evolution of Protein Domain Architectures

  • Protocol
  • First Online:
Evolutionary Genomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 856))

Abstract

This chapter reviews the current research on how protein domain architectures evolve. We begin by summarizing work on the phylogenetic distribution of proteins, as this directly impacts which domain architectures can be formed in different species. Studies relating domain family size to occurrence have shown that they generally follow power law distributions, both within genomes and larger evolutionary groups. These findings were subsequently extended to multidomain architectures. Genome evolution models that have been suggested to explain the shape of these distributions are reviewed, as well as evidence for selective pressure to expand certain domain families more than others. Each domain has an intrinsic combinatorial propensity, and the effects of this have been studied using measures of domain versatility or promiscuity. Next, we study the principles of protein domain architecture evolution and how these have been inferred from distributions of extant domain arrangements. Following this, we review inferences of ancestral domain architecture and the conclusions concerning domain architecture evolution mechanisms that can be drawn from these. Finally, we examine whether all known cases of a given domain architecture can be assumed to have a single common origin (monophyly) or have evolved convergently (polyphyly).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Andreeva A, Howorth D, Chandonia JM, Brenner SE, Hubbard TJ, Chothia C and Murzin AG. (2008) Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res. 36(Database issue):D419–425.

    Google Scholar 

  2. Cuff AL, Sillitoe I, Lewis T, Redfern OC, Garratt R, Thornton J and Orengo CA. (2009) The CATH classification revisited--architectures reviewed and new ways to characterize structural divergence in superfamilies. Nucleic Acids Res. 37(Database issue):D310-314.

    Google Scholar 

  3. Wilson D, Pethica R, Zhou Y, Talbot C, Vogel C, Madera M, Chothia C and Gough J. (2009) SUPERFAMILY--sophisticated comparative genomics, data mining, visualization and phylogeny. Nucleic Acids Res. 37(Database issue):D380-386.

    Google Scholar 

  4. Lees J, Yeats C, Redfern O, Clegg A and Orengo C. (2010) Gene3D: merging structure and function for a Thousand genomes. Nucleic Acids Res. 38(1):D296-D300.

    Article  PubMed  CAS  Google Scholar 

  5. Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunesekaran P, Ceric G, Forslund K, Holm L, Sonnhammer ELL, Eddy SR and Bateman A. (2010) The Pfam protein families database. Nucleic Acids Research, Database Issue 38:D211–222.

    Article  CAS  Google Scholar 

  6. Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, Finn RD, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Laugraud A, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Mulder N, Natale D, Orengo C, Quinn AF, Selengut JD, Sigrist CJ, Thimma M, Thomas PD, Valentin F, Wilson D, Wu CH and Yeats C. (2009) InterPro: the integrative protein signature database. Nucleic Acids Res. 37(Database issue):D211-5

    Google Scholar 

  7. Marchler-Bauer A, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, Fong JH, Geer LY, Geer RC, Gonzales NR, Gwadz M, He S, Hurwitz DI, Jackson JD, Ke Z, Lanczycki CJ, Liebert CA, Liu C, Lu F, Lu S, Marchler GH, Mullokandov M, Song JS, Tasneem A, Thanki N, Yamashita RA, Zhang D, Zhang N and Bryant SH. (2009) CDD: specific functional annotation with the Conserved Domain Database. Nucleic Acids Res. 37(Database issue):D205-210.

    Google Scholar 

  8. Letunic I, Doerks T and Bork P. (2009) SMART 6: recent updates and new developments. Nucleic Acids Res. 37(Database issue):D229–232.

    Google Scholar 

  9. Bru C, Courcelle E, Carrère S, Beausse Y, Dalmar S and Kahn D. (2005) The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res. 33(Database issue):D212–215.

    Google Scholar 

  10. UniProt Consortium. (2010) The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res. 38(Database issue):D142–148.

    Google Scholar 

  11. Heger A, Wilton CA, Sivakumar A and Holm L. (2005) ADDA: a domain database with global coverage of the protein universe. Nucleic Acids Res. 33(Database issue):D188–191.

    Google Scholar 

  12. Kummerfeld SK and Teichmann SA. (2009) Protein domain organisation:adding order. BMC Bioinformatics 10 (39). BioMed Central 2010.

    Google Scholar 

  13. Weiner J 3rd, Moore AD and Bornberg-Bauer E. (2008) Just how versatile are domains? BMC Evolutionary Biology 8(285).

    Google Scholar 

  14. del Carmen Orozco-Mosqueda M, Altamirano-Hernandez J, Farias-Rodriguez R, Valencia-Cantero E and Santoyo G. (2009) Homologous recombination and dynamics of rhizobial genomes. Research in Microbiology 160(10):733–741.

    Article  Google Scholar 

  15. Heyer WD, Ehmsen KT, and Liu J. (2010) Regulation of Homologous Recombination in Eukaryotes. Annu. Rev. Genet. 44:113–139.

    Article  PubMed  CAS  Google Scholar 

  16. Brissett NC and Doherty AJ. (2009) Repairing DNA double-strand breaks by the prokaryotic non-homologous end-joining pathway. Biochemical Society Transactions 37:539–545.

    Article  PubMed  CAS  Google Scholar 

  17. van Rijk A and Bloemendal H. (2003) Molecular mechanisms of exon shuffling: illegitimate recombination. Genetica 118:245-249.

    Article  PubMed  Google Scholar 

  18. Feschotte C and Pritham EJ. (2007) DNA transposons and the evolution of eukaryotic genomes. Annu Rev Genet. 41:331-368.

    Article  PubMed  CAS  Google Scholar 

  19. Cordaux R and Batzer MA. (2009) The impact of retrotransposons on human genome evolution. Nature Reviews Genetics 10:691–703.

    Article  PubMed  CAS  Google Scholar 

  20. Gogvadze E and Buzdin A. (2009) Retroelements and their impact on genome evolution and functioning. Cell Mol Life Sci. 66(23):3727–3742.

    Article  PubMed  CAS  Google Scholar 

  21. Patthy L. (2003) Modular assembly of genes and the evolution of new functions. Genetica. 2003 Jul;118(2–3):217–31.

    Article  PubMed  CAS  Google Scholar 

  22. Liu M and Grigoriev A. (2004) Protein domains correlate strongly with exons in multiple eukaryotic genomes – evidence of exon shuffling? Trends Genet. 20(9):399–403.

    Article  PubMed  Google Scholar 

  23. Buljan M, Frankish A and Bateman A. (2010) Quantifying themechanisms of domain gain in animal proteins. Genome Biol. 11(7):R74.BioMed Central 2010.

    Google Scholar 

  24. Weiner J 3rd, Beaussart F and Bornberg-Bauer E. (2006) Domain deletions and substitutions in the modular protein evolution. FEBS Journal 273: 2037–2047.

    Article  PubMed  CAS  Google Scholar 

  25. Schmidt EE and Davies CJ. (2007) The origins of polypeptide domains. Bioessays. 29(3): 262–270.

    Article  PubMed  CAS  Google Scholar 

  26. Huynen MA and van Nimwegen E. (1998) The Frequency Distribution of Gene Family Sizes in Complete Genomes. Mol. Biol. Evol. 15(5):583–589.

    PubMed  CAS  Google Scholar 

  27. Qian J, Luscombe NM and Gerstein M (2001) Protein Family and Fold Occurrence in Genomes: Power-law Behaviour and Evolutionary Model. J. Mol. Biol. 313:673–681.

    Article  PubMed  CAS  Google Scholar 

  28. Luscombe NM, Qian J, Zhang Z, Johnson T and Gerstein M. (2002) The dominance of the population by a selected few: power-law behaviour applies to a wide variety of genomic properties. Genome Biol 3: RESEARCH0040.

    Google Scholar 

  29. Apic G, Gough J and Teichmann SA. (2001) Domain Combinations in Archaeal, Eubacterial and Eukaryotic Proteomes. J. Mol. Biol. 310:311–325.

    Article  PubMed  CAS  Google Scholar 

  30. Apic G, Huber W and Teichmann SA. (2003) Multi-domain protein families and domain pairs: comparison with known structures and a random model of domain recombination. Journal of Structural and Functional Genomics 4:67–78.

    Article  PubMed  CAS  Google Scholar 

  31. Vogel C, Berzuini C, Bashton M, Gough J and Teichmann SA. (2004) Supra-domains: Evolutionary Units Larger than Single Protein Domains. J. Mol. Biol. 336:809–823.

    Article  PubMed  CAS  Google Scholar 

  32. Karev GP, Wolf YI, Rzhetsky AY, Berezovskaya FS and Koonin EV. (2002) Birth and death of protein domains: a simple model of evolution explains power law behavior. BMC Evol Biol. 2(1):18.

    Article  PubMed  Google Scholar 

  33. Barabási AL and Albert R. (1999) Emergence of scaling in random networks. Science. 286(5439):509–512.

    Article  PubMed  Google Scholar 

  34. Wuchty S. (2001) Scale-free Behavior in Protein Domain Networks. Mol. Biol. Evol. 18(9):1694–1702.

    Article  PubMed  CAS  Google Scholar 

  35. Rzhetsky A and Gomez SM. (2001) Birth of scale-free molecular networks and the number of distinct DNA and protein domains per genome. Bioinformatics. 17(10):988–996.

    Article  PubMed  CAS  Google Scholar 

  36. Li L, Alderson D, Tanaka R, Doyle JC and Willinger W. (2005) Towards a Theory of Scale-Free Graphs: Definition, Properties, and Implications. Internet Mathematics 2 (4): 431–523.

    Article  CAS  Google Scholar 

  37. Kuznetsov V, Pickalov V, Senko O and Knott G. (2002) Analysis of the evolving proteomes: Predictions of the number of protein domains in nature and the number of genes in eukaryotic organisms. J. Biol. Syst. 10(4):381–407.

    Article  Google Scholar 

  38. Koonin EV, Wolf YI and Karev GP. (2002) The structure of the protein universe and genome evolution. Nature 420:218-223.

    Article  PubMed  CAS  Google Scholar 

  39. Yanai I, Camacho CJ and DeLisi C. (2000) Predictions of Gene Family Distributions in Microbial Genomes: Evolution by Gene Duplication and Modification. Phys. Rev. Let. 85(12):2641–2644.

    Article  CAS  Google Scholar 

  40. van Nimwegen E. (2005) Scaling laws in the functional content of genomes. Annu. Rev. Biochem. 74:867–900.

    Article  Google Scholar 

  41. Ranea JAG, Buchan DWA, Thornton JM and Orengo CA (2004) Evolution of Protein Superfamilies and Bacterial Genome Size. J. Mol. Biol. 336:871–887.

    Article  PubMed  CAS  Google Scholar 

  42. Ranea JAG, Sillero A, Thornton JM, and Orengo CA. (2006) Protein superfamily evolution and the last universal common ancestor (LUCA). Journal of Molecular Evolution 63(4):513-525.

    Article  PubMed  CAS  Google Scholar 

  43. Chothia C and Gough J. (2009) Genomic and structural aspects of protein evolution. Biochem. J. 419:15–28.

    Article  PubMed  CAS  Google Scholar 

  44. Ekman D, Björklund ÅK and Elofsson A. (2007) Quantification of the Elevated Rate of Domain Rearrangements in Metazoa. J. Mol. Biol. 372:1337–1348.

    Article  PubMed  CAS  Google Scholar 

  45. Itoh M, Nacher JC, Kuma K, Goto S and Kanehisa M. (2007) Evolutionary history and functional implications of protein domains and their combinations in eukaryotes. Genome Biol. 8(6):R121.

    Article  PubMed  Google Scholar 

  46. Przytycka T, Davis G, Song N and Durand D. (2006) Graph theoretical insights into evolution of multidomain proteins. J Comput Biol. 13(2):351–363.

    Article  PubMed  CAS  Google Scholar 

  47. Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO and Eisenberg D. (1999). Detecting protein function and protein-protein interactions from genome sequences. Science. 285(5428):751–753.

    Article  PubMed  CAS  Google Scholar 

  48. Basu MK, Carmel L, Rogozin IB, and Koonin EV. (2008) Evolution of protein domain promiscuity in eukaryotes. Genome Res. 18:449–461.

    Article  PubMed  CAS  Google Scholar 

  49. Basu MK, Poliakov E and Rogozin IB. (2009) Domain mobility in proteins: functional and evolutionary implications. Briefings in Bioinformatics 10(3):205–216.

    Article  PubMed  CAS  Google Scholar 

  50. Bashton M and Chothia C. (2002) The Geometry of Domain Combination in Proteins. J. Mol. Biol. 315:927–939.

    Article  PubMed  CAS  Google Scholar 

  51. Gough J. (2005) Convergent evolution of domain architectures (is rare). Bioinformatics 21(8):1464–1471.

    Article  PubMed  CAS  Google Scholar 

  52. Forslund K, Hollich V, Henricson A, and Sonnhammer ELL. (2008) Domain Tree Based Analysis of Protein Architecture Evolution MBE 25:254–264.

    CAS  Google Scholar 

  53. Brivanlou AH and Darnell JE. (2002) Signal Transduction and the Control of Gene Expression. Science 295(5556):813 – 818.

    Article  PubMed  CAS  Google Scholar 

  54. Weiner J 3rd and Bornberg-Bauer E. (2006) Evolution of Circular Permutations in Multidomain Proteins. Mol. Biol. Evol. 23(4):734–743.

    Article  PubMed  CAS  Google Scholar 

  55. Tordai H, Nagy A, Farkas K, Bányai L, Patthy L. (2005) Modules, multidomain proteins and organismic complexity. FEBS J 272(19):5064–5078.

    Article  PubMed  CAS  Google Scholar 

  56. Vogel C, Teichmann SA and Pereira-Leal J. (2005) The Relationship Between Domain Duplication and Recombination. J. Mol. Biol. 346:355–365.

    Article  PubMed  CAS  Google Scholar 

  57. Björklund ÅK, Ekman D, Light S, Frey-Skött J and Elofsson A. (2005) Domain Rearrangements in Protein Evolution. J. Mol. Biol. 353:911–923.

    Article  PubMed  Google Scholar 

  58. Buljan M and Bateman A. (2009) The evolution of protein domain families. Biochem. Soc. Trans. 37:751–755.

    Article  PubMed  CAS  Google Scholar 

  59. Björklund ÅK, Ekman D and Elofsson A. (2006) Expansion of Protein Domain Repeats. PLoS Comput Biol 2(8):114.

    Article  Google Scholar 

  60. Doolittle RD and Bork P (1993) Evolutionary mobile modules in proteins. Scient Am Oct:34–40.

    Google Scholar 

  61. Moore AD, Björklund ÅK, Ekman D, Bornberg-Bauer E and Elofsson A. (2008) Arrangements in the modular evolution of proteins. Trends Biochem Sci. 33(9):444–151.

    Article  PubMed  CAS  Google Scholar 

  62. Farris JS. (1977). Phylogenetic analysis under Dollo s Law. Systematic Zoology 26: 77–88.

    Article  Google Scholar 

  63. Snel B, Bork P and Huynen M. (2000) Genome evolution. Gene fusion versus gene fission. Trends Genet. 16(1):9–11.

    CAS  Google Scholar 

  64. Kummerfeld SK and Teichmann SA. (2005) Relative rates of gene fusion and fission in multi-domain proteins. Trends in Genetics 21(1):25–30.

    Article  PubMed  CAS  Google Scholar 

  65. Fong JH, Geer LY, Panchenko AR and Bryant SH. (2007) Modeling the Evolution of Protein Domain Architectures Using Maximum Parsimony. J Mol Biol. 366(1):307–315.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Erik L. L. Sonnhammer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Forslund, K., Sonnhammer, E.L.L. (2012). Evolution of Protein Domain Architectures. In: Anisimova, M. (eds) Evolutionary Genomics. Methods in Molecular Biology, vol 856. Humana Press. https://doi.org/10.1007/978-1-61779-585-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-1-61779-585-5_8

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-61779-584-8

  • Online ISBN: 978-1-61779-585-5

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics