Skip to main content

Improving Pairwise Sequence Alignment between Distantly Related Proteins

  • Protocol
Comparative Genomics

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 395))

Summary

Sequence alignment between remotely related proteins has been one of the more difficult problems in structural biology. Improvements have been achieved by incorporating information that enhances the diversity of the substitution matrices. NdPASA is a web-based server that optimizes sequence alignments between proteins sharing low percentages of sequence identity. The program integrates structure information of the template sequence into a global alignment algorithm by employing amino acids’ neighbor-dependent propensities for secondary structure as unique parameters for alignment. NdPASA optimizes alignment by evaluating the likelihood of a residue pair in the query sequence matching against a corresponding residue pair adopting a particular secondary structure in the template sequence. The server is designed to aid homologous protein structure modeling. It is most effective when the structure of the template sequence is known. NdPASA can be accessed online at http://www.fenglab.org/bioserver.html.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Pearson, W. R. and Lipman, D. J. (1988) Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85, 2444–2448.

    Google Scholar 

  2. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) J. Mol. Biol. 215, 403–410.

    CAS  PubMed  Google Scholar 

  3. Altschul, S. F., Madden, T. L., Schaffer, A. A., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25, 3389–3402.

    Article  CAS  PubMed  Google Scholar 

  4. Chothia, C. and Lesk, A. M. (1986) The relation between the divergence of sequence and structure in proteins. EMBO J. 5, 823–826.

    CAS  PubMed  Google Scholar 

  5. Scharf, M., Schneider, R., Casari, G., et al. (1994) GeneQuiz: a workbench for sequence analysis. ISMB 2, 348–353.

    CAS  PubMed  Google Scholar 

  6. Abagyan, R. A. and Batalov, S. (1997) Do aligned sequences share the same fold? J. Mol. Biol. 273, 355–368.

    Article  CAS  PubMed  Google Scholar 

  7. Teichmann, S. A., Chothia, C., and Gerstein, M. (1999) Advances in structural genomics. Curr. Opin. Struct. Biol. 9, 390–399.

    Article  CAS  PubMed  Google Scholar 

  8. Feng, D. F., Johnson, M. S., and Doolittle, R. F. (1985) Aligning amino acid sequences: comparison of commonly used methods. J. Mol. Evol. 212, 112–125.

    Article  Google Scholar 

  9. Rost, B. (1999) Twilight zone of protein sequence alignments. Protein Eng. 12, 85–94.

    Article  CAS  PubMed  Google Scholar 

  10. Dayhoff, M., Schwartz, R. M., and Orcutt, B. C. (1978) A model of evolutionary change in proteins, in Atlas of Protein Sequence and Structure, (Dayhoff, M. ed.), National Biomedical Research Foundation, Silver Springs, MD, pp. 345–352.

    Google Scholar 

  11. Henikoff, S. and Henikoff, J. G. (1992) Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10,915–10,919.

    Google Scholar 

  12. Gribskov, M., McLachlan, A. D., and Eisenberg, D. (1987) Profile analysis: detection of distantly related proteins. Proc. Natl. Acad. Sci. USA 84, 4355–4358.

    Article  CAS  PubMed  Google Scholar 

  13. Marti-Renom, M. A., Madhusudhan, M. S., and Sali, A. (2004) Alignment of protein sequences by their profiles. Protein Sci. 13, 1071–1087.

    Article  CAS  PubMed  Google Scholar 

  14. Shi, J., Blundell, T. L., and Mizuguchi, K. (2001) FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J. Mol. Biol. 310, 243–257.

    Article  CAS  PubMed  Google Scholar 

  15. Ogata, K., Ohya, M., and Umeyama, H. (1998) Amino acid similarity matrix for homology modeling derived from structural alignment and optimized by the Monte Carlo method. J. Mol. Graph. Model. 16, 178–189.

    CAS  PubMed  Google Scholar 

  16. Johnson, M. S. and Overington, J. P. (1993) A structural basis for sequence comparisons An evaluation of scoring methodologies. J. Mol. Biol. 233, 716–738.

    Article  CAS  PubMed  Google Scholar 

  17. Russell, R. B., Saqi, M. A., Sayle, R. A., Bates, P. A., and Sternberg, M. J. (1997) Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation. J. Mol. Biol. 269, 423–439.

    Article  CAS  PubMed  Google Scholar 

  18. May, A. C. and Johnson, M. S. (1995) Improved genetic algorithm-based protein structure comparisons: pairwise and multiple superpositions. Protein Eng. 8, 873–882.

    Article  CAS  PubMed  Google Scholar 

  19. Prlic, A., Domingues, F. S., and Sippl, M. J. (2000) Structure-derived substitution matrices for alignment of distantly related sequences. Protein Eng. 13, 545–550.

    Article  CAS  PubMed  Google Scholar 

  20. Blake, J. D. and Cohen, F. E. (2001) Pairwise sequence alignment below the twilight zone. J. Mol. Biol. 307, 721–735.

    Article  CAS  PubMed  Google Scholar 

  21. Yang, A. S. (2002) Structure-dependent sequence alignment for remotely related proteins Bioinformatics 18, 1658–1665.

    Article  CAS  PubMed  Google Scholar 

  22. Panchenko, A. R. and Bryant, S. H. (2002) A comparison of position-specific score matrices based on sequence and structure alignments. Protein Sci. 11, 361–370.

    Article  CAS  PubMed  Google Scholar 

  23. Tang, C. L., Xie, L., Koh, I. Y. Y., Posy, S., Alexov, E., and Honig, B. (2003) On the role of structural information in remote homology detection and sequence alignment: New methods using hybrid sequence profiles. J. Mol. Biol. 334, 1043–1062.

    Article  CAS  PubMed  Google Scholar 

  24. Wang, J. and Feng, J. A. (2005) NdPASA: a novel pair-wise protein sequence alignment that incorporates neighbor-dependent amino acid propensities. Proteins 58, 628–637.

    Article  CAS  PubMed  Google Scholar 

  25. Crasto, C. J. and Feng, J. A. (2001) Sequence codes for extended conformation: a neighbor-dependent sequence analysis of loops in proteins. Proteins 42, 399–413.

    Article  CAS  PubMed  Google Scholar 

  26. Wang, J. and Feng, J. A. (2003) Exploring the sequence patterns in the alpha-helices of proteins. Protein Eng. 16, 799–807.

    Article  CAS  PubMed  Google Scholar 

  27. Berstein, F. C., Koetle, T. F., Williams, G. J. B., et al. (1977) The protein data bank: a computer-based archival file for macromelecular structures. J. Mol. Biol. 112, 535–542.

    Article  Google Scholar 

  28. Wang, G. and Dunbrack, R. L. (2003) PISCES: a protein sequence culling server Bioinformatics 19, 1589–1591.

    Article  CAS  PubMed  Google Scholar 

  29. Kabsch, W. and Sander, C. (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637.

    Article  CAS  PubMed  Google Scholar 

  30. Chou, P. Y. and Fasman, G. D. (1974) Conformational parameters for amino acids in helical, β-sheet, and random coil regions calculated from proteins. Biochemistry 15, 211–221.

    Article  Google Scholar 

  31. Murzin, A. G., Brenner, S. E., Hubbard, T., and Chothia, C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540.

    CAS  PubMed  Google Scholar 

  32. Needleman, S. B. and Wunsch, C. D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453.

    Article  CAS  PubMed  Google Scholar 

  33. Ginalski, K., Pas, J., Wyrwicz, L. S., von Grotthuss, M., Bujnicki, J. M., and Rychlewski, L. (2003) ORFeus: detection of distant homology using sequence profiles and predicted secondary structure. Nucl. Acids Res. 31, 3804–3807.

    Article  CAS  PubMed  Google Scholar 

  34. Smith, T. F. and Waterman, M. S. (1981) Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197.

    Article  CAS  PubMed  Google Scholar 

  35. Ortiz, A. R., Strauss, C. E., and Olmea, O. (2002) MAMMOTH: matching molecular models obtained from theory: an automated method for model comparison. Protein Sci. 11, 2606–2621.

    Article  CAS  PubMed  Google Scholar 

  36. Bryson, K., McGuffin, L. J., Marsden, R. L., Ward, J. J., Sodhi, J. S., and Jones, D. T. (2005) Protein structure prediction servers at University College London. Nucl. Acids Res. 33, W36–W38.

    Article  CAS  PubMed  Google Scholar 

  37. Jones, D. T. (1999) GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. J. Mol. Biol. 287, 797–815.

    Article  CAS  PubMed  Google Scholar 

  38. Kelley, L. A., MacCallum, R. M., and Sternberg, M. J. (2000) Enhanced genome annotation using structural profiles in the program 3D-PSSM. J. Mol. Biol. 299, 523–544.

    Article  Google Scholar 

  39. Wallner, B. and Elofsson, A. (2005) Pcons5: combining consensus, structural evaluation and fold recognition scores. Bioinformatics 21, 4248–4254.

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

The author would like to thank Wei Li, Junwen Wang for their contributions in developing NdPASA. The author also thanks for the financial support from the National Institutes of Health (GM54630), the American Cancer Society (PRG9926301GMC), and an appropriation from the commonwealth of Pennsylvania.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Humana Press Inc.

About this protocol

Cite this protocol

Feng, Ja. (2007). Improving Pairwise Sequence Alignment between Distantly Related Proteins. In: Bergman, N.H. (eds) Comparative Genomics. Methods in Molecular Biology™, vol 395. Humana Press. https://doi.org/10.1007/978-1-59745-514-5_16

Download citation

  • DOI: https://doi.org/10.1007/978-1-59745-514-5_16

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-693-1

  • Online ISBN: 978-1-59745-514-5

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics