Methods for Sequence–Structure Alignment

Venclovas, Česlovas

doi:10.1007/978-1-61779-588-6_3

Česlovas Venclovas³

Part of the book series: Methods in Molecular Biology ((MIMB,volume 857))

4777 Accesses
5 Citations

Abstract

Homology modeling is based on the observation that related protein sequences adopt similar three-dimensional structures. Hence, a homology model of a protein can be derived using related protein structure(s) as modeling template(s). A key step in this approach is the establishment of correspondence between residues of the protein to be modeled and those of modeling template(s). This step, often referred to as sequence–structure alignment, is one of the major determinants of the accuracy of a homology model. This chapter gives an overview of methods for deriving sequence–structure alignments and discusses recent methodological developments leading to improved performance. However, no method is perfect. How to find alignment regions that may have errors and how to make improvements? This is another focus of this chapter. Finally, the chapter provides a practical guidance of how to get the most of the available tools in maximizing the accuracy of sequence–structure alignments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Grishin, N. V. (2001) Fold change in evolution of protein structures, J Struct Biol 134, 167–185.
Article PubMed CAS Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool, J Mol Biol 215, 403–410.
PubMed CAS Google Scholar
Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res 25, 3389–3402.
Article PubMed CAS Google Scholar
Karlin, S., and Altschul, S. F. (1990) Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes, Proc Natl Acad Sci U S A 87, 2264–2268.
Article PubMed CAS Google Scholar
Pearson, W. R., and Lipman, D. J. (1988) Improved tools for biological sequence comparison, Proc Natl Acad Sci U S A 85, 2444–2448.
Article PubMed CAS Google Scholar
Smith, T. F., and Waterman, M. S. (1981) Identification of common molecular subsequences, J Mol Biol 147, 195–197.
Article PubMed CAS Google Scholar
Pearson, W. R. (1991) Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms, Genomics 11, 635–650.
Article PubMed CAS Google Scholar
Biegert, A., and Söding, J. (2009) Sequence context-specific profiles for homology searching, Proc Natl Acad Sci U S A 106, 3770–3775.
Article PubMed CAS Google Scholar
Gribskov, M., McLachlan, A. D., and Eisenberg, D. (1987) Profile analysis: detection of distantly related proteins, Proc Natl Acad Sci U S A 84, 4355–4358.
Article PubMed CAS Google Scholar
Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. (1999) Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press.
Google Scholar
Eddy, S. R. (1998) Profile hidden Markov models, Bioinformatics 14, 755–763.
Article PubMed CAS Google Scholar
Hughey, R., and Krogh, A. (1996) Hidden Markov models for sequence analysis: extension and analysis of the basic method, Comput Appl Biosci 12, 95–107.
PubMed CAS Google Scholar
Karplus, K. (2009) SAM-T08, HMM-based protein structure prediction, Nucleic Acids Res 37, W492–497.
Article PubMed CAS Google Scholar
Johnson, L. S., Eddy, S. R., and Portugaly, E. (2010) Hidden Markov model speed heuristic and iterative HMM search procedure, BMC Bioinformatics 11, 431.
Article PubMed Google Scholar
Sadreyev, R., and Grishin, N. (2003) COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance, J Mol Biol 326, 317–336.
Article PubMed CAS Google Scholar
Söding, J. (2005) Protein homology detection by HMM-HMM comparison, Bioinformatics 21, 951–960.
Article PubMed Google Scholar
Margelevičius, M., and Venclovas, Č. (2010) Detection of distant evolutionary relationships between protein families using theory of sequence profile-profile comparison, BMC Bioinformatics 11, 89.
Article PubMed Google Scholar
Yona, G., and Levitt, M. (2002) Within the twilight zone: a sensitive profile-profile comparison tool based on information theory, J Mol Biol 315, 1257–1275.
Article PubMed CAS Google Scholar
Madera, M. (2008) Profile Comparer: a program for scoring and aligning profile hidden Markov models, Bioinformatics 24, 2630–2631.
Article PubMed CAS Google Scholar
Rychlewski, L., Jaroszewski, L., Li, W., and Godzik, A. (2000) Comparison of sequence profiles. Strategies for structural predictions using sequence information, Protein Sci 9, 232–241.
Article PubMed CAS Google Scholar
Holm, L., and Sander, C. (1993) Protein structure comparison by alignment of distance matrices, J Mol Biol 233, 123–138.
Article PubMed CAS Google Scholar
Wang, Y., Sadreyev, R. I., and Grishin, N. V. (2009) PROCAIN: protein profile comparison with assisting information, Nucleic Acids Res 37, 3522–3530.
Article PubMed CAS Google Scholar
Eddy, S. R. (2008) A probabilistic model of local sequence alignment that simplifies statistical significance estimation, PLoS Comput Biol 4, e1000069.
Article PubMed Google Scholar
Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice, Nucleic Acids Res 22, 4673–4680.
Article PubMed CAS Google Scholar
Do, C. B., and Katoh, K. (2008) Protein multiple sequence alignment, Methods Mol Biol 484, 379–413.
Article PubMed CAS Google Scholar
Pei, J. (2008) Multiple protein sequence alignment, Curr Opin Struct Biol 18, 382–386.
Article PubMed CAS Google Scholar
Kemena, C., and Notredame, C. (2009) Upcoming challenges for multiple sequence alignment methods in the high-throughput era, Bioinformatics 25, 2455–2465.
Article PubMed CAS Google Scholar
Katoh, K., Misawa, K., Kuma, K., and Miyata, T. (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform, Nucleic Acids Res 30, 3059–3066.
Article PubMed CAS Google Scholar
Edgar, R. C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucleic Acids Res 32, 1792–1797.
Article PubMed CAS Google Scholar
Notredame, C., Higgins, D. G., and Heringa, J. (2000) T-Coffee: A novel method for fast and accurate multiple sequence alignment, J Mol Biol 302, 205–217.
Article PubMed CAS Google Scholar
Do, C. B., Mahabhashyam, M. S., Brudno, M., and Batzoglou, S. (2005) ProbCons: Probabilistic consistency-based multiple sequence alignment, Genome Res 15, 330–340.
Article PubMed CAS Google Scholar
Katoh, K., Kuma, K., Toh, H., and Miyata, T. (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment, Nucleic Acids Res 33, 511–518.
Article PubMed CAS Google Scholar
Edgar, R. C., and Batzoglou, S. (2006) Multiple sequence alignment, Curr Opin Struct Biol 16, 368–373.
Article PubMed CAS Google Scholar
Wallace, I. M., O’Sullivan, O., Higgins, D. G., and Notredame, C. (2006) M-Coffee: combining multiple sequence alignment methods with T-Coffee, Nucleic Acids Res 34, 1692–1699.
Article PubMed CAS Google Scholar
Katoh, K., Kuma, K., Miyata, T., and Toh, H. (2005) Improvement in the accuracy of multiple sequence alignment program MAFFT, Genome Inform 16, 22–33.
PubMed CAS Google Scholar
Pei, J., and Grishin, N. V. (2007) PROMALS: towards accurate multiple sequence alignments of distantly related proteins, Bioinformatics 23, 802–808.
Article PubMed CAS Google Scholar
Pei, J., Kim, B. H., and Grishin, N. V. (2008) PROMALS3D: a tool for multiple protein sequence and structure alignments, Nucleic Acids Res 36, 2295–2300.
Article PubMed CAS Google Scholar
O’Sullivan, O., Suhre, K., Abergel, C., Higgins, D. G., and Notredame, C. (2004) 3DCoffee: combining protein sequences and structures within multiple sequence alignments, J Mol Biol 340, 385–395.
Article PubMed Google Scholar
Armougom, F., Moretti, S., Poirot, O., Audic, S., Dumas, P., Schaeli, B., Keduas, V., and Notredame, C. (2006) Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee, Nucleic Acids Res 34, W604–608.
Article PubMed CAS Google Scholar
Moult, J. (2005) A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction, Curr Opin Struct Biol 15, 285–289.
Article PubMed CAS Google Scholar
Roy, A., Kucukural, A., and Zhang, Y. (2010) I-TASSER: a unified platform for automated protein structure and function prediction, Nat Protoc 5, 725–738.
Article PubMed CAS Google Scholar
Zhou, H., and Skolnick, J. (2009) Protein structure prediction by pro-Sp3-TASSER, Biophys J 96, 2119–2127.
Article PubMed CAS Google Scholar
Kim, D. E., Chivian, D., and Baker, D. (2004) Protein structure prediction and analysis using the Robetta server, Nucleic Acids Res 32, W526–531.
Article PubMed CAS Google Scholar
Kelley, L. A., and Sternberg, M. J. (2009) Protein structure prediction on the Web: a case study using the Phyre server, Nat Protoc 4, 363–371.
Article PubMed CAS Google Scholar
Wang, Z., Eickholt, J., and Cheng, J. (2010) MULTICOM: a multi-level combination approach to protein structure prediction and its assessments in CASP8, Bioinformatics 26, 882–888.
Article PubMed CAS Google Scholar
Lobley, A., Sadowski, M. I., and Jones, D. T. (2009) pGenTHREADER and pDomTHREADER: new methods for improved protein fold recognition and superfamily discrimination, Bioin-formatics 25, 1761–1767.
Article PubMed CAS Google Scholar
Jones, D. T. (1999) GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences, J Mol Biol 287, 797–815.
Article PubMed CAS Google Scholar
Kurowski, M. A., and Bujnicki, J. M. (2003) GeneSilico protein structure prediction meta-server, Nucleic Acids Res 31, 3305–3307.
Article PubMed CAS Google Scholar
Wallner, B., Larsson, P., and Elofsson, A. (2007) Pcons.net: protein structure prediction meta server, Nucleic Acids Res 35, W369–374.
Article PubMed Google Scholar
Ginalski, K. (2006) Comparative modeling for protein structure prediction, Curr Opin Struct Biol 16, 172–177.
Article PubMed CAS Google Scholar
Moult, J., Fidelis, K., Kryshtafovych, A., Rost, B., and Tramontano, A. (2009) Critical assessment of methods of protein structure prediction - Round VIII, Proteins 77 Suppl 9, 1–4.
Article PubMed CAS Google Scholar
Hildebrand, A., Remmert, M., Biegert, A., and Söding, J. (2009) Fast and accurate automatic structure prediction with HHpred, Proteins 77 Suppl 9, 128–132.
Article PubMed CAS Google Scholar
Cozzetto, D., and Tramontano, A. (2005) Relationship between multiple sequence alignments and quality of protein comparative models, Proteins 58, 151–157.
Article PubMed CAS Google Scholar
Holm, L., Kaariainen, S., Rosenstrom, P., and Schenkel, A. (2008) Searching protein structure databases with DaliLite v.3, Bioinformatics 24, 2780–2781.
Article PubMed CAS Google Scholar
Qi, Y., Sadreyev, R. I., Wang, Y., Kim, B. H., and Grishin, N. V. (2007) A comprehensive system for evaluation of remote sequence similarity detection, BMC Bioinformatics 8, 314.
Article PubMed Google Scholar
Sadreyev, R. I., and Grishin, N. V. (2004) Quality of alignment comparison by COMPASS improves with inclusion of diverse confident homologs, Bioinformatics 20, 818–828.
Article PubMed CAS Google Scholar
Tress, M. L., Cozzetto, D., Tramontano, A., and Valencia, A. (2006) An analysis of the Sargasso Sea resource and the consequences for database composition, BMC Bioinformatics 7, 213.
Article PubMed Google Scholar
Chao, K. M., Hardison, R. C., and Miller, W. (1993) Locating well-conserved regions within a pairwise alignment, Comput Appl Biosci 9, 387–396.
PubMed CAS Google Scholar
Vingron, M., and Argos, P. (1990) Determination of reliable regions in protein sequence alignments, Protein Eng 3, 565–569.
Article PubMed CAS Google Scholar
Mevissen, H. T., and Vingron, M. (1996) Quantifying the local reliability of a sequence alignment, Protein Eng 9, 127–132.
Article PubMed CAS Google Scholar
Tress, M. L., Jones, D., and Valencia, A. (2003) Predicting reliable regions in protein alignments from sequence profiles, J Mol Biol 330, 705–718.
Article PubMed CAS Google Scholar
Cline, M., Hughey, R., and Karplus, K. (2002) Predicting reliable regions in protein sequence alignments, Bioinformatics 18, 306–314.
Article PubMed CAS Google Scholar
Chen, H., and Kihara, D. (2008) Estimating quality of template-based protein models by alignment stability, Proteins 71, 1255–1274.
Article PubMed CAS Google Scholar
Margelevičius, M., and Venclovas, Č. (2005) PSI-BLAST-ISS: an intermediate sequence search tool for estimation of the position-specific alignment reliability, BMC Bioinformatics 6, 185.
Article PubMed Google Scholar
Prasad, J. C., Comeau, S. R., Vajda, S., and Camacho, C. J. (2003) Consensus alignment for reliable framework prediction in homology modeling, Bioinformatics 19, 1682–1691.
Article PubMed CAS Google Scholar
Sippl, M. J. (1993) Recognition of errors in three-dimensional structures of proteins, Proteins 17, 355–362.
Article PubMed CAS Google Scholar
Eisenberg, D., Luthy, R., and Bowie, J. U. (1997) VERIFY3D: assessment of protein models with three-dimensional profiles, Methods Enzymol 277, 396–404.
Article PubMed CAS Google Scholar
Cozzetto, D., Kryshtafovych, A., Ceriani, M., and Tramontano, A. (2007) Assessment of predictions in the model quality assessment category, Proteins 69 Suppl 8, 175–183.
Article PubMed CAS Google Scholar
Cozzetto, D., Kryshtafovych, A., and Tramontano, A. (2009) Evaluation of CASP8 model quality predictions, Proteins 77 Suppl 9, 157–166.
Article PubMed CAS Google Scholar
Benkert, P., Kunzli, M., and Schwede, T. (2009) QMEAN server for protein model quality estimation, Nucleic Acids Res 37, W510–514.
Article PubMed CAS Google Scholar
Benkert, P., Tosatto, S. C., and Schomburg, D. (2008) QMEAN: A comprehensive scoring function for model quality assessment, Proteins 71, 261–277.
Article PubMed CAS Google Scholar
Venclovas, Č. (2003) Comparative modeling in CASP5: progress is evident, but alignment errors remain a significant hindrance, Proteins 53 Suppl 6, 380–388.
Article PubMed CAS Google Scholar
Venclovas, Č., and Margelevičius, M. (2009) The use of automatic tools and human expertise in template-based modeling of CASP8 target proteins, Proteins 77 Suppl 9, 81–88.
Article PubMed CAS Google Scholar
Raman, S., Vernon, R., Thompson, J., Tyka, M., Sadreyev, R., Pei, J., Kim, D., Kellogg, E., DiMaio, F., Lange, O., Kinch, L., Sheffler, W., Kim, B. H., Das, R., Grishin, N. V., and Baker, D. (2009) Structure prediction for CASP8 with all-atom refinement using Rosetta, Proteins 77 Suppl 9, 89–99.
Article PubMed CAS Google Scholar
Cozzetto, D., Kryshtafovych, A., Fidelis, K., Moult, J., Rost, B., and Tramontano, A. (2009) Evaluation of template-based models in CASP8 with standard measures, Proteins 77 Suppl 9, 18–28.
Article PubMed CAS Google Scholar
Li, W., and Godzik, A. (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences, Bioinformatics 22, 1658–1659.
Article PubMed CAS Google Scholar
Repšys, V., Margelevičius, M., and Venclovas, Č. (2008) Re-searcher: a system for recurrent detection of homologous protein sequences, BMC Bioinformatics 9, 296.
Article PubMed Google Scholar
Söding, J., Biegert, A., and Lupas, A. N. (2005) The HHpred interactive server for protein homology detection and structure prediction, Nucleic Acids Res 33, W244–248.
Article PubMed Google Scholar
Brandt, B. W., and Heringa, J. (2009) webPRC: the Profile Comparer for alignment-based searching of public domain databases, Nucleic Acids Res 37, W48–52.
Article PubMed CAS Google Scholar
Margelevičius, M., Laganeckas, M., and Venclovas, Č. (2010) COMA server for protein distant homology search, Bioinformatics 26, 1905–1906.
Article PubMed Google Scholar
Sadreyev, R. I., Tang, M., Kim, B. H., and Grishin, N. V. (2007) COMPASS server for remote homology inference, Nucleic Acids Res 35, W653–658.
Article PubMed Google Scholar
Wang, Y., Sadreyev, R. I., and Grishin, N. V. (2009) PROCAIN server for remote protein sequence similarity search, Bioinformatics 25, 2076–2077.
Article PubMed CAS Google Scholar
Gonzalez, M. W., and Pearson, W. R. (2010) Homologous over-extension: a challenge for iterative similarity searches, Nucleic Acids Res 38, 2177–2189.
Article PubMed CAS Google Scholar
Sali, A., and Blundell, T. L. (1993) Comparative protein modelling by satisfaction of spatial restraints, J Mol Biol 234, 779–815.
Article PubMed CAS Google Scholar
Petrey, D., Xiang, Z., Tang, C. L., Xie, L., Gimpelev, M., Mitros, T., Soto, C. S., Goldsmith-Fischman, S., Kernytsky, A., Schlessinger, A., Koh, I. Y., Alexov, E., and Honig, B. (2003) Using multiple structure alignments, fast model building, and energetic analysis in fold recognition and homology modeling, Proteins 53 Suppl 6, 430–435.
Article PubMed CAS Google Scholar
Guex, N., Peitsch, M. C., and Schwede, T. (2009) Automated comparative protein structure modeling with SWISS-MODEL and Swiss-PdbViewer: a historical perspective, Electrophoresis 30 Suppl 1, S162–173.
Article PubMed Google Scholar
Wiederstein, M., and Sippl, M. J. (2007) ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins, Nucleic Acids Res 35, W407–410.
Article PubMed Google Scholar
Agarwal, V., Remmert, M., Biegert, A., and Söding, J. (2008) PDBalert: automatic, recurrent remote homology tracking and protein structure prediction, BMC Struct Biol 8, 51.
Article PubMed Google Scholar
Bradley, P., Malmstrom, L., Qian, B., Schonbrun, J., Chivian, D., Kim, D. E., Meiler, J., Misura, K. M., and Baker, D. (2005) Free modeling with Rosetta in CASP6, Proteins 61 Suppl 7, 128–134.
Article PubMed CAS Google Scholar
Zhang, Y. (2009) I-TASSER: fully automated protein structure prediction in CASP8, Proteins 77 Suppl 9, 100–113.
Article PubMed CAS Google Scholar
Zhou, H., Pandit, S. B., and Skolnick, J. (2009) Performance of the Pro-sp3-TASSER server in CASP8, Proteins 77 Suppl 9, 123–127.
Article PubMed CAS Google Scholar

Download references

Acknowledgments

Ana Vencloviené and members of Venclovas’ lab are gratefully acknowledged for useful comments and suggestions.

Author information

Authors and Affiliations

Institute of Biotechnology, Vilnius University, Vilnius, Lithuania
Česlovas Venclovas

Authors

Česlovas Venclovas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Česlovas Venclovas .

Editor information

Editors and Affiliations

Molsoft L.L.C., Sorrento Valley Road 11199, S209 San Diego, 92121, California, USA
Andrew J. W. Orry
Skaggs School of Pharmacy &, Pharmaceutical Sciences, University of California, San Diego, Gilman Drive 9500, La Jolla, 92093, California, USA
Ruben Abagyan

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Venclovas, Č. (2011). Methods for Sequence–Structure Alignment. In: Orry, A., Abagyan, R. (eds) Homology Modeling. Methods in Molecular Biology, vol 857. Humana Press. https://doi.org/10.1007/978-1-61779-588-6_3

Download citation

DOI: https://doi.org/10.1007/978-1-61779-588-6_3
Published: 11 January 2012
Publisher Name: Humana Press
Print ISBN: 978-1-61779-587-9
Online ISBN: 978-1-61779-588-6
eBook Packages: Springer Protocols

Publish with us

Policies and ethics