Part of the SpringerBriefs in Computer Science book series (BRIEFSCOMPUTER)


This chapter describes background and surveys existing popular methods on homology detection and fold recognition. In particular, this chapter reviews homology detection methods from the following perspectives: alignment-free versus alignment-based, sequence-based versus profile-based, and generative versus discriminative machine learning. Finally, this chapter also reviews a few popular scoring functions for sequence-based or profile-based protein alignment.


Homology detection Fold recognition Alignment-free homology detection Alignment-based homology detection Profile-based protein alignment 


  1. 1.
    Brent, M.R.: Steady progress and recent breakthroughs in the accuracy of automated genome annotation. Nat. Rev. Genet. 9(1), 62–73 (2008)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Consortium, G.O.: The gene ontology project in 2008. Nucleic Acids Res. 36(suppl 1), D440–D444 (2008)Google Scholar
  3. 3.
    Watson, J.D., Laskowski, R.A., Thornton, J.M.: Predicting protein function from sequence and structural data. Curr. Opin. Struct. Biol. 15(3), 275–284 (2005)CrossRefGoogle Scholar
  4. 4.
    Ginalski, K.: Comparative modeling for protein structure prediction. Curr. Opin. Struct. Biol. 16(2), 172–177 (2006)CrossRefGoogle Scholar
  5. 5.
    Flöckner, H., et al.: Progress in fold recognition. Proteins Struct. Funct. Bioinf. 23(3), 376–386 (1995)CrossRefGoogle Scholar
  6. 6.
    Eddy, S.R.: Profile hidden Markov models. Bioinformatics 14(9), 755–763 (1998)CrossRefGoogle Scholar
  7. 7.
    Baker, D., Sali, A.: Protein structure prediction and structural genomics. Science 294(5540), 93–96 (2001)CrossRefGoogle Scholar
  8. 8.
    Šali, A., et al.: Evaluation of comparative protein modeling by MODELLER. Proteins Struct. Funct. Bioinf. 23(3), 318–326 (1995)CrossRefGoogle Scholar
  9. 9.
    Fariselli, P., et al.: The WWWH of remote homolog detection: the state of the art. Briefings Bioinf. 8(2), 78–87 (2007)CrossRefGoogle Scholar
  10. 10.
    Wan, X.-F., Xu, D.: Computational methods for remote homolog identification. Curr. Protein Pept. Sci. 6(6), 527–546 (2005)CrossRefGoogle Scholar
  11. 11.
    Madera, M., Gough, J.: A comparison of profile hidden Markov model procedures for remote homology detection. Nucleic Acids Res. 30(19), 4321–4328 (2002)CrossRefGoogle Scholar
  12. 12.
    Jones, D.T., Taylor, W.R., Thornton, J.M.: The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. CABIOS 8(3), 275–282 (1992)Google Scholar
  13. 13.
    Grigoriev, I.V., Kim, S.-H.: Detection of protein fold similarity based on correlation of amino acid properties. Proc. Natl. Acad. Sci. 96(25), 14318–14323 (1999)CrossRefGoogle Scholar
  14. 14.
    Deschavanne, P., Tuffery, P.: Exploring an alignment free approach for protein classification and structural class prediction. Biochimie 90(4), 615–625 (2008)CrossRefGoogle Scholar
  15. 15.
    Jaakkola, T., Diekhans, M., Haussler, D.: A discriminative framework for detecting remote protein homologies. J. Comput. Biol. 7(1–2), 95–114 (2000)CrossRefGoogle Scholar
  16. 16.
    Kuang, R., et al.: Profile-based string kernels for remote homology detection and motif extraction. J. Bioinf. Comput. Biol. 3(03), 527–550 (2005)CrossRefGoogle Scholar
  17. 17.
    Leslie, C.S., Eskin, E., Noble, W.S.: The spectrum kernel: a string kernel for SVM protein classification. In: Pacific Symposium on Biocomputing (2002)Google Scholar
  18. 18.
    Liao, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. J. Comput. Biol. 10(6), 857–868 (2003)CrossRefGoogle Scholar
  19. 19.
    Jaakkola, T., Diekhans, M., Haussler, D.: Using the Fisher kernel method to detect remote protein homologies. In: ISMB (1999)Google Scholar
  20. 20.
    Leslie, C.S., et al.: Mismatch string kernels for discriminative protein classification. Bioinformatics 20(4), 467–476 (2004)CrossRefGoogle Scholar
  21. 21.
    Byvatov, E., Schneider, G.: Support vector machine applications in bioinformatics. Appl. Bioinf. 2(2), 67–77 (2002)Google Scholar
  22. 22.
    Jebara, T.: Machine Learning: Discriminative and Generative. Springer, Berlin (2004)Google Scholar
  23. 23.
    Balakrishnan, S., et al.: Learning generative models for protein fold families. Proteins Struct. Funct. Bioinf. 79(4), 1061–1078 (2011)CrossRefGoogle Scholar
  24. 24.
    Thomas, J., Ramakrishnan, N., Bailey-Kellogg, C.: Protein design by sampling an undirected graphical model of residue constraints. IEEE/ACM Trans. Comput. Biol. Bioinf. 6(3), 506–516 (2009)CrossRefGoogle Scholar
  25. 25.
    Shen, H.-B., Chou, K.-C.: Ensemble classifier for protein fold pattern recognition. Bioinformatics 22(14), 1717–1722 (2006)CrossRefGoogle Scholar
  26. 26.
    Tan, A., Gilbert, D., Deville, Y.: Multi-class protein fold classification using a new ensemble machine learning approach (2003)Google Scholar
  27. 27.
    Dehzangi, A., Phon-Amnuaisuk, S., Dehzangi, O.: Using random forest for protein fold prediction problem: an empirical study. J. Inf. Sci. Eng. 26(6), 1941–1956 (2010)Google Scholar
  28. 28.
    Lundström, J., et al.: Pcons: a neural-network-based consensus predictor that improves fold recognition. Protein Sci. 10(11), 2354–2362 (2001)CrossRefGoogle Scholar
  29. 29.
    McGuffin, L.J., Jones, D.T.: Improvement of the GenTHREADER method for genomic fold recognition. Bioinformatics 19(7), 874–881 (2003)CrossRefGoogle Scholar
  30. 30.
    Zakeri, P., et al.: Protein fold recognition using geometric kernel data fusion. Bioinformatics btu118 (2014)Google Scholar
  31. 31.
    Do, C.B., Gross, S.S., Batzoglou, S.: CONTRAlign: discriminative training for protein sequence alignment. In: Research in Computational Molecular Biology. Springer, Berlin (2006)Google Scholar
  32. 32.
    Ding, C.H., Dubchak, I.: Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17(4), 349–358 (2001)CrossRefGoogle Scholar
  33. 33.
    Dong, Q., Zhou, S., Guan, J.: A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation. Bioinformatics 25(20), 2655–2662 (2009)CrossRefGoogle Scholar
  34. 34.
    Sharma, A., et al.: A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. J. Theor. Biol. 320, 41–46 (2013)CrossRefGoogle Scholar
  35. 35.
    Smith, T.F., Waterman, M.S.: Comparison of biosequences. Adv. Appl. Math. 2(4), 482–489 (1981)CrossRefzbMATHMathSciNetGoogle Scholar
  36. 36.
    Pearson, W.R.: Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics 11(3), 635–650 (1991)CrossRefGoogle Scholar
  37. 37.
    Altschul, S.F., et al.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)CrossRefGoogle Scholar
  38. 38.
    Pearson, W.R.: [5] Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 183, 63–98 (1990)CrossRefGoogle Scholar
  39. 39.
    Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. 89(22), 10915–10919 (1992)CrossRefGoogle Scholar
  40. 40.
    Eddy, S.R.: HMMER: profile hidden Markov models for biological sequence analysis (2001)Google Scholar
  41. 41.
    Hughey, R., Krogh, A.: Hidden Markov models for sequence analysis: extension and analysis of the basic method. Comput. Appl. Biosci. CABIOS 12(2), 95–107 (1996)Google Scholar
  42. 42.
    Morgenstern, B., et al.: DIALIGN: finding local similarities by multiple sequence alignment. Bioinformatics 14(3), 290–294 (1998)CrossRefGoogle Scholar
  43. 43.
    Probst, W.C., et al.: Sequence alignment of the G-protein coupled receptor superfamily. DNA Cell Biol. 11(1), 1–20 (1992)CrossRefGoogle Scholar
  44. 44.
    Söding, J.: Protein homology detection by HMM–HMM comparison. Bioinformatics 21(7), 951–960 (2005)CrossRefGoogle Scholar
  45. 45.
    Tomii, K., Akiyama, Y.: FORTE: a profile–profile comparison tool for protein fold recognition. Bioinformatics 20(4), 594–595 (2004)CrossRefGoogle Scholar
  46. 46.
    Heger, A., Holm, L.: Picasso: generating a covering set of protein family profiles. Bioinformatics 17(3), 272–279 (2001)CrossRefGoogle Scholar
  47. 47.
    Moult, J.: A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Curr. Opin. Struct. Biol. 15(3), 285–289 (2005)CrossRefGoogle Scholar
  48. 48.
    Pruitt, K.D., Tatusova, T., Maglott, D.R.: NCBI reference sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 33(suppl 1), D501–D504 (2005)Google Scholar
  49. 49.
    Bates, P.A., et al.: Enhancement of protein modeling by human intervention in applying the automatic programs 3D-JIGSAW and 3D-PSSM. Proteins Struct. Funct. Bioinf. 45(S5), 39–46 (2001)CrossRefGoogle Scholar
  50. 50.
    Koonin, E.V., Wolf, Y.I., Aravind, L.: Protein fold recognition using sequence profiles and its application in structural genomics. Adv. Protein Chem. 54, 245–275 (2000)CrossRefGoogle Scholar
  51. 51.
    Eddy, S.R.: Hidden markov models. Curr. Opin. Struct. Biol. 6(3), 361–365 (1996)CrossRefGoogle Scholar
  52. 52.
    Bateman, A., et al.: The Pfam protein families database. Nucleic Acids Res. 32(suppl 1), D138–D141 (2004)CrossRefGoogle Scholar
  53. 53.
    Bateman, A., et al.: The Pfam protein families database. Nucleic Acids Res. 30(1), 276–280 (2002)CrossRefGoogle Scholar
  54. 54.
    Gough, J., Chothia, C.: SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments. Nucleic Acids Res. 30(1), 268–272 (2002)CrossRefGoogle Scholar
  55. 55.
    Ma, J., et al.: MRFalign: protein homology detection through alignment of Markov random fields. PLoS Comput. Biol. 10(3), e1003500 (2014)CrossRefGoogle Scholar
  56. 56.
    Yona, G., Levitt, M.: Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. J. Mol. Biol. 315(5), 1257–1275 (2002)CrossRefGoogle Scholar
  57. 57.
    Rychlewski, L., Zhang, B., Godzik, A.: Fold and function predictions for fold and function predictions for. Fold Des. 3(4), 229–238 (1998)CrossRefGoogle Scholar
  58. 58.
    Wang, G., Dunbrack, R.L.: Scoring profile-to-profile sequence alignments. Protein Sci. 13(6), 1612–1626 (2004)CrossRefGoogle Scholar
  59. 59.
    Boyd, S., et al.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011)CrossRefGoogle Scholar
  60. 60.
    Daniels, N.M., et al.: SMURFLite: combining simplified Markov random fields with simulated evolution improves remote homology detection for beta-structural proteins into the twilight zone. Bioinformatics 28(9), 1216–1222 (2012)CrossRefGoogle Scholar
  61. 61.
    Daniels, N.M., et al.: MRFy: remote homology detection for beta-structural proteins using Markov random fields and stochastic search. In: Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics. ACM (2013)Google Scholar

Copyright information

© The Author(s) 2015

Authors and Affiliations

  1. 1.Toyota Technological InstituteChicagoUSA

Personalised recommendations