Skip to main content

Methods for Sequence–Structure Alignment

  • Protocol
  • First Online:
Homology Modeling

Part of the book series: Methods in Molecular Biology ((MIMB,volume 857))

Abstract

Homology modeling is based on the observation that related protein sequences adopt similar three-dimensional structures. Hence, a homology model of a protein can be derived using related protein structure(s) as modeling template(s). A key step in this approach is the establishment of correspondence between residues of the protein to be modeled and those of modeling template(s). This step, often referred to as sequence–structure alignment, is one of the major determinants of the accuracy of a homology model. This chapter gives an overview of methods for deriving sequence–structure alignments and discusses recent methodological developments leading to improved performance. However, no method is perfect. How to find alignment regions that may have errors and how to make improvements? This is another focus of this chapter. Finally, the chapter provides a practical guidance of how to get the most of the available tools in maximizing the accuracy of sequence–structure alignments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Grishin, N. V. (2001) Fold change in evolution of protein structures, J Struct Biol 134, 167–185.

    Article  PubMed  CAS  Google Scholar 

  2. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool, J Mol Biol 215, 403–410.

    PubMed  CAS  Google Scholar 

  3. Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res 25, 3389–3402.

    Article  PubMed  CAS  Google Scholar 

  4. Karlin, S., and Altschul, S. F. (1990) Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes, Proc Natl Acad Sci U S A 87, 2264–2268.

    Article  PubMed  CAS  Google Scholar 

  5. Pearson, W. R., and Lipman, D. J. (1988) Improved tools for biological sequence comparison, Proc Natl Acad Sci U S A 85, 2444–2448.

    Article  PubMed  CAS  Google Scholar 

  6. Smith, T. F., and Waterman, M. S. (1981) Identification of common molecular subsequences, J Mol Biol 147, 195–197.

    Article  PubMed  CAS  Google Scholar 

  7. Pearson, W. R. (1991) Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms, Genomics 11, 635–650.

    Article  PubMed  CAS  Google Scholar 

  8. Biegert, A., and Söding, J. (2009) Sequence context-specific profiles for homology searching, Proc Natl Acad Sci U S A 106, 3770–3775.

    Article  PubMed  CAS  Google Scholar 

  9. Gribskov, M., McLachlan, A. D., and Eisenberg, D. (1987) Profile analysis: detection of distantly related proteins, Proc Natl Acad Sci U S A 84, 4355–4358.

    Article  PubMed  CAS  Google Scholar 

  10. Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. (1999) Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press.

    Google Scholar 

  11. Eddy, S. R. (1998) Profile hidden Markov models, Bioinformatics 14, 755–763.

    Article  PubMed  CAS  Google Scholar 

  12. Hughey, R., and Krogh, A. (1996) Hidden Markov models for sequence analysis: extension and analysis of the basic method, Comput Appl Biosci 12, 95–107.

    PubMed  CAS  Google Scholar 

  13. Karplus, K. (2009) SAM-T08, HMM-based protein structure prediction, Nucleic Acids Res 37, W492–497.

    Article  PubMed  CAS  Google Scholar 

  14. Johnson, L. S., Eddy, S. R., and Portugaly, E. (2010) Hidden Markov model speed heuristic and iterative HMM search procedure, BMC Bioinformatics 11, 431.

    Article  PubMed  Google Scholar 

  15. Sadreyev, R., and Grishin, N. (2003) COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance, J Mol Biol 326, 317–336.

    Article  PubMed  CAS  Google Scholar 

  16. Söding, J. (2005) Protein homology detection by HMM-HMM comparison, Bioinformatics 21, 951–960.

    Article  PubMed  Google Scholar 

  17. Margelevičius, M., and Venclovas, Č. (2010) Detection of distant evolutionary relationships between protein families using theory of sequence profile-profile comparison, BMC Bioinformatics 11, 89.

    Article  PubMed  Google Scholar 

  18. Yona, G., and Levitt, M. (2002) Within the twilight zone: a sensitive profile-profile comparison tool based on information theory, J Mol Biol 315, 1257–1275.

    Article  PubMed  CAS  Google Scholar 

  19. Madera, M. (2008) Profile Comparer: a program for scoring and aligning profile hidden Markov models, Bioinformatics 24, 2630–2631.

    Article  PubMed  CAS  Google Scholar 

  20. Rychlewski, L., Jaroszewski, L., Li, W., and Godzik, A. (2000) Comparison of sequence profiles. Strategies for structural predictions using sequence information, Protein Sci 9, 232–241.

    Article  PubMed  CAS  Google Scholar 

  21. Holm, L., and Sander, C. (1993) Protein structure comparison by alignment of distance matrices, J Mol Biol 233, 123–138.

    Article  PubMed  CAS  Google Scholar 

  22. Wang, Y., Sadreyev, R. I., and Grishin, N. V. (2009) PROCAIN: protein profile comparison with assisting information, Nucleic Acids Res 37, 3522–3530.

    Article  PubMed  CAS  Google Scholar 

  23. Eddy, S. R. (2008) A probabilistic model of local sequence alignment that simplifies statistical significance estimation, PLoS Comput Biol 4, e1000069.

    Article  PubMed  Google Scholar 

  24. Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice, Nucleic Acids Res 22, 4673–4680.

    Article  PubMed  CAS  Google Scholar 

  25. Do, C. B., and Katoh, K. (2008) Protein multiple sequence alignment, Methods Mol Biol 484, 379–413.

    Article  PubMed  CAS  Google Scholar 

  26. Pei, J. (2008) Multiple protein sequence alignment, Curr Opin Struct Biol 18, 382–386.

    Article  PubMed  CAS  Google Scholar 

  27. Kemena, C., and Notredame, C. (2009) Upcoming challenges for multiple sequence alignment methods in the high-throughput era, Bioinformatics 25, 2455–2465.

    Article  PubMed  CAS  Google Scholar 

  28. Katoh, K., Misawa, K., Kuma, K., and Miyata, T. (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform, Nucleic Acids Res 30, 3059–3066.

    Article  PubMed  CAS  Google Scholar 

  29. Edgar, R. C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucleic Acids Res 32, 1792–1797.

    Article  PubMed  CAS  Google Scholar 

  30. Notredame, C., Higgins, D. G., and Heringa, J. (2000) T-Coffee: A novel method for fast and accurate multiple sequence alignment, J Mol Biol 302, 205–217.

    Article  PubMed  CAS  Google Scholar 

  31. Do, C. B., Mahabhashyam, M. S., Brudno, M., and Batzoglou, S. (2005) ProbCons: Probabilistic consistency-based multiple sequence alignment, Genome Res 15, 330–340.

    Article  PubMed  CAS  Google Scholar 

  32. Katoh, K., Kuma, K., Toh, H., and Miyata, T. (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment, Nucleic Acids Res 33, 511–518.

    Article  PubMed  CAS  Google Scholar 

  33. Edgar, R. C., and Batzoglou, S. (2006) Multiple sequence alignment, Curr Opin Struct Biol 16, 368–373.

    Article  PubMed  CAS  Google Scholar 

  34. Wallace, I. M., O’Sullivan, O., Higgins, D. G., and Notredame, C. (2006) M-Coffee: combining multiple sequence alignment methods with T-Coffee, Nucleic Acids Res 34, 1692–1699.

    Article  PubMed  CAS  Google Scholar 

  35. Katoh, K., Kuma, K., Miyata, T., and Toh, H. (2005) Improvement in the accuracy of multiple sequence alignment program MAFFT, Genome Inform 16, 22–33.

    PubMed  CAS  Google Scholar 

  36. Pei, J., and Grishin, N. V. (2007) PROMALS: towards accurate multiple sequence alignments of distantly related proteins, Bioinformatics 23, 802–808.

    Article  PubMed  CAS  Google Scholar 

  37. Pei, J., Kim, B. H., and Grishin, N. V. (2008) PROMALS3D: a tool for multiple protein sequence and structure alignments, Nucleic Acids Res 36, 2295–2300.

    Article  PubMed  CAS  Google Scholar 

  38. O’Sullivan, O., Suhre, K., Abergel, C., Higgins, D. G., and Notredame, C. (2004) 3DCoffee: combining protein sequences and structures within multiple sequence alignments, J Mol Biol 340, 385–395.

    Article  PubMed  Google Scholar 

  39. Armougom, F., Moretti, S., Poirot, O., Audic, S., Dumas, P., Schaeli, B., Keduas, V., and Notredame, C. (2006) Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee, Nucleic Acids Res 34, W604–608.

    Article  PubMed  CAS  Google Scholar 

  40. Moult, J. (2005) A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction, Curr Opin Struct Biol 15, 285–289.

    Article  PubMed  CAS  Google Scholar 

  41. Roy, A., Kucukural, A., and Zhang, Y. (2010) I-TASSER: a unified platform for automated protein structure and function prediction, Nat Protoc 5, 725–738.

    Article  PubMed  CAS  Google Scholar 

  42. Zhou, H., and Skolnick, J. (2009) Protein structure prediction by pro-Sp3-TASSER, Biophys J 96, 2119–2127.

    Article  PubMed  CAS  Google Scholar 

  43. Kim, D. E., Chivian, D., and Baker, D. (2004) Protein structure prediction and analysis using the Robetta server, Nucleic Acids Res 32, W526–531.

    Article  PubMed  CAS  Google Scholar 

  44. Kelley, L. A., and Sternberg, M. J. (2009) Protein structure prediction on the Web: a case study using the Phyre server, Nat Protoc 4, 363–371.

    Article  PubMed  CAS  Google Scholar 

  45. Wang, Z., Eickholt, J., and Cheng, J. (2010) MULTICOM: a multi-level combination approach to protein structure prediction and its assessments in CASP8, Bioinformatics 26, 882–888.

    Article  PubMed  CAS  Google Scholar 

  46. Lobley, A., Sadowski, M. I., and Jones, D. T. (2009) pGenTHREADER and pDomTHREADER: new methods for improved protein fold recognition and superfamily discrimination, Bioin-formatics 25, 1761–1767.

    Article  PubMed  CAS  Google Scholar 

  47. Jones, D. T. (1999) GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences, J Mol Biol 287, 797–815.

    Article  PubMed  CAS  Google Scholar 

  48. Kurowski, M. A., and Bujnicki, J. M. (2003) GeneSilico protein structure prediction meta-server, Nucleic Acids Res 31, 3305–3307.

    Article  PubMed  CAS  Google Scholar 

  49. Wallner, B., Larsson, P., and Elofsson, A. (2007) Pcons.net: protein structure prediction meta server, Nucleic Acids Res 35, W369–374.

    Article  PubMed  Google Scholar 

  50. Ginalski, K. (2006) Comparative modeling for protein structure prediction, Curr Opin Struct Biol 16, 172–177.

    Article  PubMed  CAS  Google Scholar 

  51. Moult, J., Fidelis, K., Kryshtafovych, A., Rost, B., and Tramontano, A. (2009) Critical assessment of methods of protein structure prediction - Round VIII, Proteins 77 Suppl 9, 1–4.

    Article  PubMed  CAS  Google Scholar 

  52. Hildebrand, A., Remmert, M., Biegert, A., and Söding, J. (2009) Fast and accurate automatic structure prediction with HHpred, Proteins 77 Suppl 9, 128–132.

    Article  PubMed  CAS  Google Scholar 

  53. Cozzetto, D., and Tramontano, A. (2005) Relationship between multiple sequence alignments and quality of protein comparative models, Proteins 58, 151–157.

    Article  PubMed  CAS  Google Scholar 

  54. Holm, L., Kaariainen, S., Rosenstrom, P., and Schenkel, A. (2008) Searching protein structure databases with DaliLite v.3, Bioinformatics 24, 2780–2781.

    Article  PubMed  CAS  Google Scholar 

  55. Qi, Y., Sadreyev, R. I., Wang, Y., Kim, B. H., and Grishin, N. V. (2007) A comprehensive system for evaluation of remote sequence similarity detection, BMC Bioinformatics 8, 314.

    Article  PubMed  Google Scholar 

  56. Sadreyev, R. I., and Grishin, N. V. (2004) Quality of alignment comparison by COMPASS improves with inclusion of diverse confident homologs, Bioinformatics 20, 818–828.

    Article  PubMed  CAS  Google Scholar 

  57. Tress, M. L., Cozzetto, D., Tramontano, A., and Valencia, A. (2006) An analysis of the Sargasso Sea resource and the consequences for database composition, BMC Bioinformatics 7, 213.

    Article  PubMed  Google Scholar 

  58. Chao, K. M., Hardison, R. C., and Miller, W. (1993) Locating well-conserved regions within a pairwise alignment, Comput Appl Biosci 9, 387–396.

    PubMed  CAS  Google Scholar 

  59. Vingron, M., and Argos, P. (1990) Determination of reliable regions in protein sequence alignments, Protein Eng 3, 565–569.

    Article  PubMed  CAS  Google Scholar 

  60. Mevissen, H. T., and Vingron, M. (1996) Quantifying the local reliability of a sequence alignment, Protein Eng 9, 127–132.

    Article  PubMed  CAS  Google Scholar 

  61. Tress, M. L., Jones, D., and Valencia, A. (2003) Predicting reliable regions in protein alignments from sequence profiles, J Mol Biol 330, 705–718.

    Article  PubMed  CAS  Google Scholar 

  62. Cline, M., Hughey, R., and Karplus, K. (2002) Predicting reliable regions in protein sequence alignments, Bioinformatics 18, 306–314.

    Article  PubMed  CAS  Google Scholar 

  63. Chen, H., and Kihara, D. (2008) Estimating quality of template-based protein models by alignment stability, Proteins 71, 1255–1274.

    Article  PubMed  CAS  Google Scholar 

  64. Margelevičius, M., and Venclovas, Č. (2005) PSI-BLAST-ISS: an intermediate sequence search tool for estimation of the position-specific alignment reliability, BMC Bioinformatics 6, 185.

    Article  PubMed  Google Scholar 

  65. Prasad, J. C., Comeau, S. R., Vajda, S., and Camacho, C. J. (2003) Consensus alignment for reliable framework prediction in homology modeling, Bioinformatics 19, 1682–1691.

    Article  PubMed  CAS  Google Scholar 

  66. Sippl, M. J. (1993) Recognition of errors in three-dimensional structures of proteins, Proteins 17, 355–362.

    Article  PubMed  CAS  Google Scholar 

  67. Eisenberg, D., Luthy, R., and Bowie, J. U. (1997) VERIFY3D: assessment of protein models with three-dimensional profiles, Methods Enzymol 277, 396–404.

    Article  PubMed  CAS  Google Scholar 

  68. Cozzetto, D., Kryshtafovych, A., Ceriani, M., and Tramontano, A. (2007) Assessment of predictions in the model quality assessment category, Proteins 69 Suppl 8, 175–183.

    Article  PubMed  CAS  Google Scholar 

  69. Cozzetto, D., Kryshtafovych, A., and Tramontano, A. (2009) Evaluation of CASP8 model quality predictions, Proteins 77 Suppl 9, 157–166.

    Article  PubMed  CAS  Google Scholar 

  70. Benkert, P., Kunzli, M., and Schwede, T. (2009) QMEAN server for protein model quality estimation, Nucleic Acids Res 37, W510–514.

    Article  PubMed  CAS  Google Scholar 

  71. Benkert, P., Tosatto, S. C., and Schomburg, D. (2008) QMEAN: A comprehensive scoring function for model quality assessment, Proteins 71, 261–277.

    Article  PubMed  CAS  Google Scholar 

  72. Venclovas, Č. (2003) Comparative modeling in CASP5: progress is evident, but alignment errors remain a significant hindrance, Proteins 53 Suppl 6, 380–388.

    Article  PubMed  CAS  Google Scholar 

  73. Venclovas, Č., and Margelevičius, M. (2009) The use of automatic tools and human expertise in template-based modeling of CASP8 target proteins, Proteins 77 Suppl 9, 81–88.

    Article  PubMed  CAS  Google Scholar 

  74. Raman, S., Vernon, R., Thompson, J., Tyka, M., Sadreyev, R., Pei, J., Kim, D., Kellogg, E., DiMaio, F., Lange, O., Kinch, L., Sheffler, W., Kim, B. H., Das, R., Grishin, N. V., and Baker, D. (2009) Structure prediction for CASP8 with all-atom refinement using Rosetta, Proteins 77 Suppl 9, 89–99.

    Article  PubMed  CAS  Google Scholar 

  75. Cozzetto, D., Kryshtafovych, A., Fidelis, K., Moult, J., Rost, B., and Tramontano, A. (2009) Evaluation of template-based models in CASP8 with standard measures, Proteins 77 Suppl 9, 18–28.

    Article  PubMed  CAS  Google Scholar 

  76. Li, W., and Godzik, A. (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences, Bioinformatics 22, 1658–1659.

    Article  PubMed  CAS  Google Scholar 

  77. Repšys, V., Margelevičius, M., and Venclovas, Č. (2008) Re-searcher: a system for recurrent detection of homologous protein sequences, BMC Bioinformatics 9, 296.

    Article  PubMed  Google Scholar 

  78. Söding, J., Biegert, A., and Lupas, A. N. (2005) The HHpred interactive server for protein homology detection and structure prediction, Nucleic Acids Res 33, W244–248.

    Article  PubMed  Google Scholar 

  79. Brandt, B. W., and Heringa, J. (2009) webPRC: the Profile Comparer for alignment-based searching of public domain databases, Nucleic Acids Res 37, W48–52.

    Article  PubMed  CAS  Google Scholar 

  80. Margelevičius, M., Laganeckas, M., and Venclovas, Č. (2010) COMA server for protein distant homology search, Bioinformatics 26, 1905–1906.

    Article  PubMed  Google Scholar 

  81. Sadreyev, R. I., Tang, M., Kim, B. H., and Grishin, N. V. (2007) COMPASS server for remote homology inference, Nucleic Acids Res 35, W653–658.

    Article  PubMed  Google Scholar 

  82. Wang, Y., Sadreyev, R. I., and Grishin, N. V. (2009) PROCAIN server for remote protein sequence similarity search, Bioinformatics 25, 2076–2077.

    Article  PubMed  CAS  Google Scholar 

  83. Gonzalez, M. W., and Pearson, W. R. (2010) Homologous over-extension: a challenge for iterative similarity searches, Nucleic Acids Res 38, 2177–2189.

    Article  PubMed  CAS  Google Scholar 

  84. Sali, A., and Blundell, T. L. (1993) Comparative protein modelling by satisfaction of spatial restraints, J Mol Biol 234, 779–815.

    Article  PubMed  CAS  Google Scholar 

  85. Petrey, D., Xiang, Z., Tang, C. L., Xie, L., Gimpelev, M., Mitros, T., Soto, C. S., Goldsmith-Fischman, S., Kernytsky, A., Schlessinger, A., Koh, I. Y., Alexov, E., and Honig, B. (2003) Using multiple structure alignments, fast model building, and energetic analysis in fold recognition and homology modeling, Proteins 53 Suppl 6, 430–435.

    Article  PubMed  CAS  Google Scholar 

  86. Guex, N., Peitsch, M. C., and Schwede, T. (2009) Automated comparative protein structure modeling with SWISS-MODEL and Swiss-PdbViewer: a historical perspective, Electrophoresis 30 Suppl 1, S162–173.

    Article  PubMed  Google Scholar 

  87. Wiederstein, M., and Sippl, M. J. (2007) ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins, Nucleic Acids Res 35, W407–410.

    Article  PubMed  Google Scholar 

  88. Agarwal, V., Remmert, M., Biegert, A., and Söding, J. (2008) PDBalert: automatic, recurrent remote homology tracking and protein structure prediction, BMC Struct Biol 8, 51.

    Article  PubMed  Google Scholar 

  89. Bradley, P., Malmstrom, L., Qian, B., Schonbrun, J., Chivian, D., Kim, D. E., Meiler, J., Misura, K. M., and Baker, D. (2005) Free modeling with Rosetta in CASP6, Proteins 61 Suppl 7, 128–134.

    Article  PubMed  CAS  Google Scholar 

  90. Zhang, Y. (2009) I-TASSER: fully automated protein structure prediction in CASP8, Proteins 77 Suppl 9, 100–113.

    Article  PubMed  CAS  Google Scholar 

  91. Zhou, H., Pandit, S. B., and Skolnick, J. (2009) Performance of the Pro-sp3-TASSER server in CASP8, Proteins 77 Suppl 9, 123–127.

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

Ana Vencloviené and members of Venclovas’ lab are gratefully acknowledged for useful comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Česlovas Venclovas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media,LLC

About this protocol

Cite this protocol

Venclovas, Č. (2011). Methods for Sequence–Structure Alignment. In: Orry, A., Abagyan, R. (eds) Homology Modeling. Methods in Molecular Biology, vol 857. Humana Press. https://doi.org/10.1007/978-1-61779-588-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-61779-588-6_3

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-61779-587-9

  • Online ISBN: 978-1-61779-588-6

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics