Skip to main content

Introduction

  • Chapter
  • First Online:
  • 567 Accesses

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

Abstract

This chapter describes background and surveys existing popular methods on homology detection and fold recognition. In particular, this chapter reviews homology detection methods from the following perspectives: alignment-free versus alignment-based, sequence-based versus profile-based, and generative versus discriminative machine learning. Finally, this chapter also reviews a few popular scoring functions for sequence-based or profile-based protein alignment.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Brent, M.R.: Steady progress and recent breakthroughs in the accuracy of automated genome annotation. Nat. Rev. Genet. 9(1), 62–73 (2008)

    Article  MathSciNet  Google Scholar 

  2. Consortium, G.O.: The gene ontology project in 2008. Nucleic Acids Res. 36(suppl 1), D440–D444 (2008)

    Google Scholar 

  3. Watson, J.D., Laskowski, R.A., Thornton, J.M.: Predicting protein function from sequence and structural data. Curr. Opin. Struct. Biol. 15(3), 275–284 (2005)

    Article  Google Scholar 

  4. Ginalski, K.: Comparative modeling for protein structure prediction. Curr. Opin. Struct. Biol. 16(2), 172–177 (2006)

    Article  Google Scholar 

  5. Flöckner, H., et al.: Progress in fold recognition. Proteins Struct. Funct. Bioinf. 23(3), 376–386 (1995)

    Article  Google Scholar 

  6. Eddy, S.R.: Profile hidden Markov models. Bioinformatics 14(9), 755–763 (1998)

    Article  Google Scholar 

  7. Baker, D., Sali, A.: Protein structure prediction and structural genomics. Science 294(5540), 93–96 (2001)

    Article  Google Scholar 

  8. Šali, A., et al.: Evaluation of comparative protein modeling by MODELLER. Proteins Struct. Funct. Bioinf. 23(3), 318–326 (1995)

    Article  Google Scholar 

  9. Fariselli, P., et al.: The WWWH of remote homolog detection: the state of the art. Briefings Bioinf. 8(2), 78–87 (2007)

    Article  Google Scholar 

  10. Wan, X.-F., Xu, D.: Computational methods for remote homolog identification. Curr. Protein Pept. Sci. 6(6), 527–546 (2005)

    Article  Google Scholar 

  11. Madera, M., Gough, J.: A comparison of profile hidden Markov model procedures for remote homology detection. Nucleic Acids Res. 30(19), 4321–4328 (2002)

    Article  Google Scholar 

  12. Jones, D.T., Taylor, W.R., Thornton, J.M.: The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. CABIOS 8(3), 275–282 (1992)

    Google Scholar 

  13. Grigoriev, I.V., Kim, S.-H.: Detection of protein fold similarity based on correlation of amino acid properties. Proc. Natl. Acad. Sci. 96(25), 14318–14323 (1999)

    Article  Google Scholar 

  14. Deschavanne, P., Tuffery, P.: Exploring an alignment free approach for protein classification and structural class prediction. Biochimie 90(4), 615–625 (2008)

    Article  Google Scholar 

  15. Jaakkola, T., Diekhans, M., Haussler, D.: A discriminative framework for detecting remote protein homologies. J. Comput. Biol. 7(1–2), 95–114 (2000)

    Article  Google Scholar 

  16. Kuang, R., et al.: Profile-based string kernels for remote homology detection and motif extraction. J. Bioinf. Comput. Biol. 3(03), 527–550 (2005)

    Article  Google Scholar 

  17. Leslie, C.S., Eskin, E., Noble, W.S.: The spectrum kernel: a string kernel for SVM protein classification. In: Pacific Symposium on Biocomputing (2002)

    Google Scholar 

  18. Liao, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. J. Comput. Biol. 10(6), 857–868 (2003)

    Article  Google Scholar 

  19. Jaakkola, T., Diekhans, M., Haussler, D.: Using the Fisher kernel method to detect remote protein homologies. In: ISMB (1999)

    Google Scholar 

  20. Leslie, C.S., et al.: Mismatch string kernels for discriminative protein classification. Bioinformatics 20(4), 467–476 (2004)

    Article  Google Scholar 

  21. Byvatov, E., Schneider, G.: Support vector machine applications in bioinformatics. Appl. Bioinf. 2(2), 67–77 (2002)

    Google Scholar 

  22. Jebara, T.: Machine Learning: Discriminative and Generative. Springer, Berlin (2004)

    Google Scholar 

  23. Balakrishnan, S., et al.: Learning generative models for protein fold families. Proteins Struct. Funct. Bioinf. 79(4), 1061–1078 (2011)

    Article  Google Scholar 

  24. Thomas, J., Ramakrishnan, N., Bailey-Kellogg, C.: Protein design by sampling an undirected graphical model of residue constraints. IEEE/ACM Trans. Comput. Biol. Bioinf. 6(3), 506–516 (2009)

    Article  Google Scholar 

  25. Shen, H.-B., Chou, K.-C.: Ensemble classifier for protein fold pattern recognition. Bioinformatics 22(14), 1717–1722 (2006)

    Article  Google Scholar 

  26. Tan, A., Gilbert, D., Deville, Y.: Multi-class protein fold classification using a new ensemble machine learning approach (2003)

    Google Scholar 

  27. Dehzangi, A., Phon-Amnuaisuk, S., Dehzangi, O.: Using random forest for protein fold prediction problem: an empirical study. J. Inf. Sci. Eng. 26(6), 1941–1956 (2010)

    Google Scholar 

  28. Lundström, J., et al.: Pcons: a neural-network-based consensus predictor that improves fold recognition. Protein Sci. 10(11), 2354–2362 (2001)

    Article  Google Scholar 

  29. McGuffin, L.J., Jones, D.T.: Improvement of the GenTHREADER method for genomic fold recognition. Bioinformatics 19(7), 874–881 (2003)

    Article  Google Scholar 

  30. Zakeri, P., et al.: Protein fold recognition using geometric kernel data fusion. Bioinformatics btu118 (2014)

    Google Scholar 

  31. Do, C.B., Gross, S.S., Batzoglou, S.: CONTRAlign: discriminative training for protein sequence alignment. In: Research in Computational Molecular Biology. Springer, Berlin (2006)

    Google Scholar 

  32. Ding, C.H., Dubchak, I.: Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17(4), 349–358 (2001)

    Article  Google Scholar 

  33. Dong, Q., Zhou, S., Guan, J.: A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation. Bioinformatics 25(20), 2655–2662 (2009)

    Article  Google Scholar 

  34. Sharma, A., et al.: A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. J. Theor. Biol. 320, 41–46 (2013)

    Article  Google Scholar 

  35. Smith, T.F., Waterman, M.S.: Comparison of biosequences. Adv. Appl. Math. 2(4), 482–489 (1981)

    Article  MATH  MathSciNet  Google Scholar 

  36. Pearson, W.R.: Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics 11(3), 635–650 (1991)

    Article  Google Scholar 

  37. Altschul, S.F., et al.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)

    Article  Google Scholar 

  38. Pearson, W.R.: [5] Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 183, 63–98 (1990)

    Article  Google Scholar 

  39. Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. 89(22), 10915–10919 (1992)

    Article  Google Scholar 

  40. Eddy, S.R.: HMMER: profile hidden Markov models for biological sequence analysis (2001)

    Google Scholar 

  41. Hughey, R., Krogh, A.: Hidden Markov models for sequence analysis: extension and analysis of the basic method. Comput. Appl. Biosci. CABIOS 12(2), 95–107 (1996)

    Google Scholar 

  42. Morgenstern, B., et al.: DIALIGN: finding local similarities by multiple sequence alignment. Bioinformatics 14(3), 290–294 (1998)

    Article  Google Scholar 

  43. Probst, W.C., et al.: Sequence alignment of the G-protein coupled receptor superfamily. DNA Cell Biol. 11(1), 1–20 (1992)

    Article  Google Scholar 

  44. Söding, J.: Protein homology detection by HMM–HMM comparison. Bioinformatics 21(7), 951–960 (2005)

    Article  Google Scholar 

  45. Tomii, K., Akiyama, Y.: FORTE: a profile–profile comparison tool for protein fold recognition. Bioinformatics 20(4), 594–595 (2004)

    Article  Google Scholar 

  46. Heger, A., Holm, L.: Picasso: generating a covering set of protein family profiles. Bioinformatics 17(3), 272–279 (2001)

    Article  Google Scholar 

  47. Moult, J.: A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Curr. Opin. Struct. Biol. 15(3), 285–289 (2005)

    Article  Google Scholar 

  48. Pruitt, K.D., Tatusova, T., Maglott, D.R.: NCBI reference sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 33(suppl 1), D501–D504 (2005)

    Google Scholar 

  49. Bates, P.A., et al.: Enhancement of protein modeling by human intervention in applying the automatic programs 3D-JIGSAW and 3D-PSSM. Proteins Struct. Funct. Bioinf. 45(S5), 39–46 (2001)

    Article  Google Scholar 

  50. Koonin, E.V., Wolf, Y.I., Aravind, L.: Protein fold recognition using sequence profiles and its application in structural genomics. Adv. Protein Chem. 54, 245–275 (2000)

    Article  Google Scholar 

  51. Eddy, S.R.: Hidden markov models. Curr. Opin. Struct. Biol. 6(3), 361–365 (1996)

    Article  Google Scholar 

  52. Bateman, A., et al.: The Pfam protein families database. Nucleic Acids Res. 32(suppl 1), D138–D141 (2004)

    Article  Google Scholar 

  53. Bateman, A., et al.: The Pfam protein families database. Nucleic Acids Res. 30(1), 276–280 (2002)

    Article  Google Scholar 

  54. Gough, J., Chothia, C.: SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments. Nucleic Acids Res. 30(1), 268–272 (2002)

    Article  Google Scholar 

  55. Ma, J., et al.: MRFalign: protein homology detection through alignment of Markov random fields. PLoS Comput. Biol. 10(3), e1003500 (2014)

    Article  Google Scholar 

  56. Yona, G., Levitt, M.: Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. J. Mol. Biol. 315(5), 1257–1275 (2002)

    Article  Google Scholar 

  57. Rychlewski, L., Zhang, B., Godzik, A.: Fold and function predictions for fold and function predictions for. Fold Des. 3(4), 229–238 (1998)

    Article  Google Scholar 

  58. Wang, G., Dunbrack, R.L.: Scoring profile-to-profile sequence alignments. Protein Sci. 13(6), 1612–1626 (2004)

    Article  Google Scholar 

  59. Boyd, S., et al.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011)

    Article  Google Scholar 

  60. Daniels, N.M., et al.: SMURFLite: combining simplified Markov random fields with simulated evolution improves remote homology detection for beta-structural proteins into the twilight zone. Bioinformatics 28(9), 1216–1222 (2012)

    Article  Google Scholar 

  61. Daniels, N.M., et al.: MRFy: remote homology detection for beta-structural proteins using Markov random fields and stochastic search. In: Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics. ACM (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinbo Xu .

Rights and permissions

Reprints and permissions

Copyright information

© 2015 The Author(s)

About this chapter

Cite this chapter

Xu, J., Wang, S., Ma, J. (2015). Introduction. In: Protein Homology Detection Through Alignment of Markov Random Fields. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-14914-1_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14914-1_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14913-4

  • Online ISBN: 978-3-319-14914-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics