Skip to main content

Coevolutionary Analysis of Protein Sequences for Molecular Modeling

  • Protocol
  • First Online:
Biomolecular Simulations

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2022))

Abstract

Thanks to the explosion of genomic sequencing, coevolutionary analysis of protein sequences has gained great and ever-increasing popularity in the last decade, and it is currently an important and well-established tool in structural bioinformatics and computational biology. This chapter concisely introduces the theoretical foundation and the practical aspects of coevolutionary analysis, as well as discusses the molecular modeling strategies to exploit its results in the study of protein structure, dynamics, and interactions. We present here a complete pipeline from sequence extraction to contact prediction through two examples, focusing on the predictions of inter-residue contacts in a single protein domain and on the analysis of a multi-domain protein that undergoes functional, large-scale conformational transitions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Weigt M, White RA, Szurmant H et al (2009) Identification of direct residue contacts in protein-protein interaction by message passing. Proc Natl Acad Sci U S A 106:67–72. https://doi.org/10.1073/pnas.0805923106

    Article  PubMed  Google Scholar 

  2. Jones DT, DW a B, Cozzetto D, Pontil M (2012) PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 28:184–190. https://doi.org/10.1093/bioinformatics/btr638

    Article  CAS  PubMed  Google Scholar 

  3. Marks DS, Colwell LJ, Sheridan R et al (2011) Protein 3D structure computed from evolutionary sequence variation. PLoS One 6. https://doi.org/10.1371/journal.pone.0028766

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Balakrishnan S, Kamisetty H, Carbonell JG et al (2011) Learning generative models for protein fold families. Proteins 79:1061–1078. https://doi.org/10.1002/prot.22934

    Article  CAS  PubMed  Google Scholar 

  5. Morcos F, Hwa T, Onuchic JN, Weigt M (2014) Direct coupling analysis for protein contact prediction. In: Kihara D (ed) Protein structure prediction. Springer, New York, NY, pp 55–70

    Chapter  Google Scholar 

  6. Sułkowska JI, Morcos F, Weigt M et al (2012) Genomics-aided structure prediction. Proc Natl Acad Sci U S A 109:10340–10345. https://doi.org/10.1073/pnas.1207864109

    Article  PubMed  PubMed Central  Google Scholar 

  7. Hopf TA, Colwell LJ, Sheridan R et al (2012) Three-dimensional structures of membrane proteins from genomic sequencing. Cell 149:1607–1621. https://doi.org/10.1016/j.cell.2012.04.012

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. T a H, Morinaga S, Ihara S et al (2015) Amino acid coevolution revealrs three-dimensional structure and functional domains of insect odorant receptors. Nat Commun 6:1–7. https://doi.org/10.1038/ncomms7077

    Article  CAS  Google Scholar 

  9. Ovchinnikov S, Kamisetty H, Baker D (2014) Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information. elife 3:e02030. https://doi.org/10.7554/eLife.02030

    Article  PubMed  PubMed Central  Google Scholar 

  10. Hopf TA, Schärfe CPI, Rodrigues JPGLM et al (2014) Sequence co-evolution gives 3D contacts and structures of protein complexes. elife 3:e03430

    Article  PubMed Central  Google Scholar 

  11. Malinverni D, Jost Lopez A, De Los Rios P et al (2017) Modeling Hsp70/Hsp40 interaction by multi-scale molecular simulations and co-evolutionary sequence analysis. elife 6:e23471. https://doi.org/10.7554/eLife.23471

    Article  PubMed  PubMed Central  Google Scholar 

  12. Szurmant H, Weigt M (2017) Inter-residue, inter-protein and inter-family coevolution: bridging the scales. Curr Opin Struct Biol 50:26–32. https://doi.org/10.1016/j.sbi.2017.10.014

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Uguzzoni G, John Lovis S, Oteri F et al (2017) Large-scale identification of coevolution signals across homo-oligomeric protein interfaces by direct coupling analysis. Proc Natl Acad Sci 114:E2662–E2671. https://doi.org/10.1073/pnas.1615068114

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Morcos F, Pagnani A, Lunt B et al (2011) Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci U S A 108:E1293–E1301. https://doi.org/10.1073/pnas.1111471108

    Article  PubMed  PubMed Central  Google Scholar 

  15. Fantini M, Malinverni D, De Los Rios P, Pastore A (2017) New techniques for ancient proteins: direct coupling analysis applied on proteins involved in iron sulfur cluster biogenesis. Front Mol Biosci 4:1–14. https://doi.org/10.3389/fmolb.2017.00040

    Article  CAS  Google Scholar 

  16. Morcos F, Jana B, Hwa T, Onuchic JN (2013) Coevolutionary signals across protein lineages help capture multiple protein conformations. Proc Natl Acad Sci U S A 110:20533–20538. https://doi.org/10.1073/pnas.1315625110

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Parisi G, Zea DJ, Monzon AM, Marino-Buslje C (2015) Conformational diversity and the emergence of sequence signatures during evolution. Curr Opin Struct Biol 32:58–65. https://doi.org/10.1016/j.sbi.2015.02.005

    Article  CAS  PubMed  Google Scholar 

  18. Sutto L, Marsili S, Valencia A, Gervasio FL (2015) From residue coevolution to protein conformational ensembles and functional dynamics. Proc Natl Acad Sci 112:13567–13572. https://doi.org/10.1073/pnas.1508584112

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Göbel U, Sander C, Schneider R, Valencia A (1994) Correlated mutations and residue contacts in proteins. Proteins Struct Funct Genet 18:309–317

    Article  PubMed  Google Scholar 

  20. Lapedes AS, Giraud BG, Liu L, Stormo GD (1999) Correlated mutations in models of protein sequences: phylogenetic and structural effects. Lect Notes Monogr Ser 33:236–256. https://doi.org/10.2307/4356049

    Article  Google Scholar 

  21. Martin LC, Gloor GB, Dunn SD, Wahl LM (2005) Using information theory to search for co-evolving residues in proteins. Bioinformatics 21:4116–4124. https://doi.org/10.1093/bioinformatics/bti671

    Article  CAS  PubMed  Google Scholar 

  22. Burger L, Van Nimwegen E (2010) Disentangling direct from indirect co-evolution of residues in protein alignments. PLoS Comput Biol 6. https://doi.org/10.1371/journal.pcbi.1000633

    Article  PubMed  PubMed Central  Google Scholar 

  23. Ekeberg M, Lövkvist C, Lan Y et al (2013) Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys Rev E 87:0127071–0127016. https://doi.org/10.1103/PhysRevE.87.012707

    Article  CAS  Google Scholar 

  24. Cocco S, Feinauer C, Figliuzzi M et al (2017) Inverse statistical physics of protein sequences: a key issues review. Rep Prog Phys 81(3):032601

    Article  Google Scholar 

  25. Jaynes ET (1957) Information theory and statistical mechanics. Phys Rev 106:620–630

    Article  Google Scholar 

  26. Dunn SD, Wahl LM, Gloor GB (2008) Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics 24:333–340. https://doi.org/10.1093/bioinformatics/btm604

    Article  CAS  PubMed  Google Scholar 

  27. Kaján L, Hopf TA, Kalaš M et al (2014) FreeContact: fast and free software for protein contact prediction from residue co-evolution. BMC Bioinformatics 15:1–6. https://doi.org/10.1186/1471-2105-15-85

    Article  CAS  Google Scholar 

  28. Baldassi C, Zamparo M, Feinauer C et al (2014) Fast and accurate multivariate Gaussian modeling of protein families: predicting residue contacts and protein-interaction partners. PLoS One 9:1–12. https://doi.org/10.1371/journal.pone.0092721

    Article  CAS  Google Scholar 

  29. Seemayer S, Gruber M, Söding J (2014) CCMpred – fast and precise prediction of protein residue-residue contacts from correlated mutations. Bioinformatics. https://doi.org/10.1093/bioinformatics/btu500

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Barton JP, De Leonardis E, Coucke A, Cocco S (2016) ACE: adaptive cluster expansion for maximum entropy graphical model inference. Bioinformatics 32:3089–3097. https://doi.org/10.1093/bioinformatics/btw328

    Article  CAS  PubMed  Google Scholar 

  31. Figliuzzi M, Barrat-Charlaix P, Weigt M (2018) How pairwise coevolutionary models capture the collective residue variability in proteins. Mol Biol Evol:1–17. https://doi.org/10.1093/molbev/msy007

    Article  CAS  PubMed  Google Scholar 

  32. Ekeberg M, Hartonen T, Aurell E (2014) Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences. J Comput Phys 276:341–356. https://doi.org/10.1016/j.jcp.2014.07.024

    Article  CAS  Google Scholar 

  33. Gueudré T, Baldassi C, Zamparo M et al (2016) Simultaneous identification of specifically interacting paralogs and inter-protein contacts by direct-coupling analysis. Proc Natl Acad Sci 113:12186–12191. https://doi.org/10.1073/pnas.1607570113

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Bitbol A-F, Dwyer RS, Colwell LJ, Wingreen NS (2016) Inferring interaction partners from protein sequences. Proc Natl Acad Sci 113:12180–12185. https://doi.org/10.1101/050732

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Feinauer C, Skwark MJ, Pagnani A, Aurell E (2014) Improving contact prediction along three dimensions. PLoS Comput Biol 10:e1003847. https://doi.org/10.1371/journal.pcbi.1003847

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Skwark MJ, Raimondi D, Michel M, Elofsson A (2014) Improved contact predictions using the recognition of protein like contact patterns. PLoS Comput Biol 10:e1003889. https://doi.org/10.1371/journal.pcbi.1003889

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Michel M, Skwark MJ, Menéndez Hurtado D et al (2017) Predicting accurate contacts in thousands of Pfam domain families using PconsC3. Bioinformatics 33:2859–2866. https://doi.org/10.1093/bioinformatics/btx332

    Article  CAS  PubMed  Google Scholar 

  38. Ovchinnikov S, Park H, Varghese N et al (2017) Protein structure determination using metagenome sequence data. Science (80) 355:294–298. https://doi.org/10.1126/science.aah4043

    Article  CAS  Google Scholar 

  39. Kim DE, Dimaio F, Yu-Ruei Wang R et al (2014) One contact for every twelve residues allows robust and accurate topology-level protein structure modeling. Proteins 82(Suppl 2):208–218. https://doi.org/10.1002/prot.24374

    Article  CAS  PubMed  Google Scholar 

  40. Brunger AT (2007) Version 1.2 of the crystallography and NMR system. Nat Protoc 2:2728–2733. https://doi.org/10.1038/nprot.2007.406

    Article  CAS  PubMed  Google Scholar 

  41. Dominguez C, Boelens R, Bonvin AMJJ (2003) HADDOCK: a protein−protein docking approach based on biochemical or biophysical information. J Am Chem Soc 125:1731–1737. https://doi.org/10.1021/ja026939x

    Article  CAS  PubMed  Google Scholar 

  42. Sirovetz BJ, Schafer NP, Wolynes PG Protein structure prediction: making AWSEM AWSEM-ER by adding evolutionary restraints. Proteins 85:2127–2142. https://doi.org/10.1002/prot.25367

    Article  CAS  Google Scholar 

  43. Davtyan A, Schafer NP, Zheng W et al (2012) AWSEM-MD: protein structure prediction using coarse-grained physical potentials and bioinformatically based local structure biasing. J Phys Chem B 116:8494–8503. https://doi.org/10.1021/jp212541y

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Noel JK, Whitford PC, Sanbonmatsu KY, Onuchic JN (2010) SMOG@ctbp: simplified deployment of structure-based models in GROMACS. Nucleic Acids Res 38:W657–W661. https://doi.org/10.1093/nar/gkq498

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Noel JK, Levi M, Raghunathan M et al (2016) SMOG 2: a versatile software package for generating structure-based models. PLoS Comput Biol 12:e1004794. https://doi.org/10.1371/journal.pcbi.1004794

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Kamisetty H, Ovchinnikow S, Baker D (2013) Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era. Proc Natl Acad Sci 110:15674–15679. https://doi.org/10.1073/pnas.1319550110

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Morcos F, Schafer NP, Cheng RR et al (2014) Coevolutionary information, protein folding landscapes, and the thermodynamics of natural selection. Proc Natl Acad Sci 111:12408–12413. https://doi.org/10.1073/pnas.1413575111

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Toth-Petroczy A, Palmedo P, Ingraham J et al (2016) Structured states of disordered proteins from genomic sequences. Cell 167:158–170.e12. https://doi.org/10.1016/j.cell.2016.09.010

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Feinauer C, Szurmant H, Weigt M, Pagnani A (2016) Inter-protein sequence co-evolution predicts known physical interactions in bacterial ribosomes and the Trp operon. PLoS One 11:e0149166. https://doi.org/10.1371/journal.pone.0149166

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Bitbol A-F, Dwyer RS, Colwell LJ, Wingreen NS (2016) Inferring interaction partners from protein sequences. bioRxiv 2016, 050732. https://doi.org/10.1101/050732

  51. Malinverni D, Marsili S, Barducci A, De Los Rios P (2015) Large-scale conformational transitions and dimerization are encoded in the amino-acid sequences of Hsp70 chaperones. PLoS Comput Biol 11:e1004262. https://doi.org/10.1371/journal.pcbi.1004262

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Schug A, Weigt M, Onuchic JN et al (2009) High-resolution protein complexes from integrating genomic information with molecular simulation. Proc Natl Acad Sci U S A 106:22124–22129. https://doi.org/10.1073/pnas.0912100106

    Article  PubMed  PubMed Central  Google Scholar 

  53. dos Santos RN, Khan S, Morcos F (2018) Characterization of C-ring component assembly in flagellar motors from amino acid coevolution. R Soc Open Sci 5. https://doi.org/10.1098/rsos.171854

    Article  PubMed  PubMed Central  Google Scholar 

  54. Pandini A, Morcos F, Khan S (2016) The gearbox of the bacterial flagellar motor switch. Structure 24:1209–1220. https://doi.org/10.1016/j.str.2016.05.012

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Sfriso P, Duran-Frigola M, Mosca R et al (2016) Residues coevolution guides the systematic identification of alternative functional conformations in proteins. Structure 24:116–126. https://doi.org/10.1016/j.str.2015.10.025

    Article  CAS  PubMed  Google Scholar 

  56. Shamsi Z, Moffett AS, Shukla D (2017) Enhanced unbiased sampling of protein dynamics using evolutionary coupling information. Sci Rep 7:1–13. https://doi.org/10.1038/s41598-017-12874-7

    Article  CAS  Google Scholar 

  57. Feng J, Shukla D (2018) Characterizing conformational dynamics of proteins using evolutionary couplings. J Phys Chem B 122:1017–1025. https://doi.org/10.1021/acs.jpcb.7b07529

    Article  CAS  PubMed  Google Scholar 

  58. Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39:29–37. https://doi.org/10.1093/nar/gkr367

    Article  CAS  Google Scholar 

  59. Finn RD, Mistry J, Tate J et al (2010) The Pfam protein families database. Nucleic Acids Res 38:D211–D222. https://doi.org/10.1093/nar/gkp985

    Article  CAS  PubMed  Google Scholar 

  60. Anishchenko I, Ovchinnikov S, Kamisetty H, Baker D (2017) Origins of coevolution between residues distant in protein 3D structures. Proc Natl Acad Sci 114:9122–9127. https://doi.org/10.1073/pnas.1702664114

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The authors thank Paolo De Los Rios, Faruck Morcos, Elijah Irvine, Rémy Bailly and Camille Elleaume for their critical reading of this manuscript. Duccio Malinverni acknowledges the support of the National Science foundation under grants 2012_149278 and 20020_163042/1. Alessandro Barducci acknowledges the support of the Agence Nationale de Recherche (ANR) under grant ANR-14-ACHN-0016.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Duccio Malinverni or Alessandro Barducci .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Malinverni, D., Barducci, A. (2019). Coevolutionary Analysis of Protein Sequences for Molecular Modeling. In: Bonomi, M., Camilloni, C. (eds) Biomolecular Simulations. Methods in Molecular Biology, vol 2022. Humana, New York, NY. https://doi.org/10.1007/978-1-4939-9608-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-9608-7_16

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-4939-9607-0

  • Online ISBN: 978-1-4939-9608-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics