Abstract
This chapter describes background and surveys existing popular methods on homology detection and fold recognition. In particular, this chapter reviews homology detection methods from the following perspectives: alignment-free versus alignment-based, sequence-based versus profile-based, and generative versus discriminative machine learning. Finally, this chapter also reviews a few popular scoring functions for sequence-based or profile-based protein alignment.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Brent, M.R.: Steady progress and recent breakthroughs in the accuracy of automated genome annotation. Nat. Rev. Genet. 9(1), 62–73 (2008)
Consortium, G.O.: The gene ontology project in 2008. Nucleic Acids Res. 36(suppl 1), D440–D444 (2008)
Watson, J.D., Laskowski, R.A., Thornton, J.M.: Predicting protein function from sequence and structural data. Curr. Opin. Struct. Biol. 15(3), 275–284 (2005)
Ginalski, K.: Comparative modeling for protein structure prediction. Curr. Opin. Struct. Biol. 16(2), 172–177 (2006)
Flöckner, H., et al.: Progress in fold recognition. Proteins Struct. Funct. Bioinf. 23(3), 376–386 (1995)
Eddy, S.R.: Profile hidden Markov models. Bioinformatics 14(9), 755–763 (1998)
Baker, D., Sali, A.: Protein structure prediction and structural genomics. Science 294(5540), 93–96 (2001)
Šali, A., et al.: Evaluation of comparative protein modeling by MODELLER. Proteins Struct. Funct. Bioinf. 23(3), 318–326 (1995)
Fariselli, P., et al.: The WWWH of remote homolog detection: the state of the art. Briefings Bioinf. 8(2), 78–87 (2007)
Wan, X.-F., Xu, D.: Computational methods for remote homolog identification. Curr. Protein Pept. Sci. 6(6), 527–546 (2005)
Madera, M., Gough, J.: A comparison of profile hidden Markov model procedures for remote homology detection. Nucleic Acids Res. 30(19), 4321–4328 (2002)
Jones, D.T., Taylor, W.R., Thornton, J.M.: The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. CABIOS 8(3), 275–282 (1992)
Grigoriev, I.V., Kim, S.-H.: Detection of protein fold similarity based on correlation of amino acid properties. Proc. Natl. Acad. Sci. 96(25), 14318–14323 (1999)
Deschavanne, P., Tuffery, P.: Exploring an alignment free approach for protein classification and structural class prediction. Biochimie 90(4), 615–625 (2008)
Jaakkola, T., Diekhans, M., Haussler, D.: A discriminative framework for detecting remote protein homologies. J. Comput. Biol. 7(1–2), 95–114 (2000)
Kuang, R., et al.: Profile-based string kernels for remote homology detection and motif extraction. J. Bioinf. Comput. Biol. 3(03), 527–550 (2005)
Leslie, C.S., Eskin, E., Noble, W.S.: The spectrum kernel: a string kernel for SVM protein classification. In: Pacific Symposium on Biocomputing (2002)
Liao, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. J. Comput. Biol. 10(6), 857–868 (2003)
Jaakkola, T., Diekhans, M., Haussler, D.: Using the Fisher kernel method to detect remote protein homologies. In: ISMB (1999)
Leslie, C.S., et al.: Mismatch string kernels for discriminative protein classification. Bioinformatics 20(4), 467–476 (2004)
Byvatov, E., Schneider, G.: Support vector machine applications in bioinformatics. Appl. Bioinf. 2(2), 67–77 (2002)
Jebara, T.: Machine Learning: Discriminative and Generative. Springer, Berlin (2004)
Balakrishnan, S., et al.: Learning generative models for protein fold families. Proteins Struct. Funct. Bioinf. 79(4), 1061–1078 (2011)
Thomas, J., Ramakrishnan, N., Bailey-Kellogg, C.: Protein design by sampling an undirected graphical model of residue constraints. IEEE/ACM Trans. Comput. Biol. Bioinf. 6(3), 506–516 (2009)
Shen, H.-B., Chou, K.-C.: Ensemble classifier for protein fold pattern recognition. Bioinformatics 22(14), 1717–1722 (2006)
Tan, A., Gilbert, D., Deville, Y.: Multi-class protein fold classification using a new ensemble machine learning approach (2003)
Dehzangi, A., Phon-Amnuaisuk, S., Dehzangi, O.: Using random forest for protein fold prediction problem: an empirical study. J. Inf. Sci. Eng. 26(6), 1941–1956 (2010)
Lundström, J., et al.: Pcons: a neural-network-based consensus predictor that improves fold recognition. Protein Sci. 10(11), 2354–2362 (2001)
McGuffin, L.J., Jones, D.T.: Improvement of the GenTHREADER method for genomic fold recognition. Bioinformatics 19(7), 874–881 (2003)
Zakeri, P., et al.: Protein fold recognition using geometric kernel data fusion. Bioinformatics btu118 (2014)
Do, C.B., Gross, S.S., Batzoglou, S.: CONTRAlign: discriminative training for protein sequence alignment. In: Research in Computational Molecular Biology. Springer, Berlin (2006)
Ding, C.H., Dubchak, I.: Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17(4), 349–358 (2001)
Dong, Q., Zhou, S., Guan, J.: A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation. Bioinformatics 25(20), 2655–2662 (2009)
Sharma, A., et al.: A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. J. Theor. Biol. 320, 41–46 (2013)
Smith, T.F., Waterman, M.S.: Comparison of biosequences. Adv. Appl. Math. 2(4), 482–489 (1981)
Pearson, W.R.: Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics 11(3), 635–650 (1991)
Altschul, S.F., et al.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)
Pearson, W.R.: [5] Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 183, 63–98 (1990)
Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. 89(22), 10915–10919 (1992)
Eddy, S.R.: HMMER: profile hidden Markov models for biological sequence analysis (2001)
Hughey, R., Krogh, A.: Hidden Markov models for sequence analysis: extension and analysis of the basic method. Comput. Appl. Biosci. CABIOS 12(2), 95–107 (1996)
Morgenstern, B., et al.: DIALIGN: finding local similarities by multiple sequence alignment. Bioinformatics 14(3), 290–294 (1998)
Probst, W.C., et al.: Sequence alignment of the G-protein coupled receptor superfamily. DNA Cell Biol. 11(1), 1–20 (1992)
Söding, J.: Protein homology detection by HMM–HMM comparison. Bioinformatics 21(7), 951–960 (2005)
Tomii, K., Akiyama, Y.: FORTE: a profile–profile comparison tool for protein fold recognition. Bioinformatics 20(4), 594–595 (2004)
Heger, A., Holm, L.: Picasso: generating a covering set of protein family profiles. Bioinformatics 17(3), 272–279 (2001)
Moult, J.: A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Curr. Opin. Struct. Biol. 15(3), 285–289 (2005)
Pruitt, K.D., Tatusova, T., Maglott, D.R.: NCBI reference sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 33(suppl 1), D501–D504 (2005)
Bates, P.A., et al.: Enhancement of protein modeling by human intervention in applying the automatic programs 3D-JIGSAW and 3D-PSSM. Proteins Struct. Funct. Bioinf. 45(S5), 39–46 (2001)
Koonin, E.V., Wolf, Y.I., Aravind, L.: Protein fold recognition using sequence profiles and its application in structural genomics. Adv. Protein Chem. 54, 245–275 (2000)
Eddy, S.R.: Hidden markov models. Curr. Opin. Struct. Biol. 6(3), 361–365 (1996)
Bateman, A., et al.: The Pfam protein families database. Nucleic Acids Res. 32(suppl 1), D138–D141 (2004)
Bateman, A., et al.: The Pfam protein families database. Nucleic Acids Res. 30(1), 276–280 (2002)
Gough, J., Chothia, C.: SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments. Nucleic Acids Res. 30(1), 268–272 (2002)
Ma, J., et al.: MRFalign: protein homology detection through alignment of Markov random fields. PLoS Comput. Biol. 10(3), e1003500 (2014)
Yona, G., Levitt, M.: Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. J. Mol. Biol. 315(5), 1257–1275 (2002)
Rychlewski, L., Zhang, B., Godzik, A.: Fold and function predictions for fold and function predictions for. Fold Des. 3(4), 229–238 (1998)
Wang, G., Dunbrack, R.L.: Scoring profile-to-profile sequence alignments. Protein Sci. 13(6), 1612–1626 (2004)
Boyd, S., et al.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011)
Daniels, N.M., et al.: SMURFLite: combining simplified Markov random fields with simulated evolution improves remote homology detection for beta-structural proteins into the twilight zone. Bioinformatics 28(9), 1216–1222 (2012)
Daniels, N.M., et al.: MRFy: remote homology detection for beta-structural proteins using Markov random fields and stochastic search. In: Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics. ACM (2013)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2015 The Author(s)
About this chapter
Cite this chapter
Xu, J., Wang, S., Ma, J. (2015). Introduction. In: Protein Homology Detection Through Alignment of Markov Random Fields. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-14914-1_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-14914-1_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14913-4
Online ISBN: 978-3-319-14914-1
eBook Packages: Computer ScienceComputer Science (R0)