Introduction

Xu, Jinbo; Wang, Sheng; Ma, Jianzhu

doi:10.1007/978-3-319-14914-1_1

Introduction

Jinbo Xu¹⁸,
Sheng Wang¹⁸ &
Jianzhu Ma¹⁸

Chapter
First Online: 01 January 2015

567 Accesses

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

Abstract

This chapter describes background and surveys existing popular methods on homology detection and fold recognition. In particular, this chapter reviews homology detection methods from the following perspectives: alignment-free versus alignment-based, sequence-based versus profile-based, and generative versus discriminative machine learning. Finally, this chapter also reviews a few popular scoring functions for sequence-based or profile-based protein alignment.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Brent, M.R.: Steady progress and recent breakthroughs in the accuracy of automated genome annotation. Nat. Rev. Genet. 9(1), 62–73 (2008)
Article MathSciNet Google Scholar
Consortium, G.O.: The gene ontology project in 2008. Nucleic Acids Res. 36(suppl 1), D440–D444 (2008)
Google Scholar
Watson, J.D., Laskowski, R.A., Thornton, J.M.: Predicting protein function from sequence and structural data. Curr. Opin. Struct. Biol. 15(3), 275–284 (2005)
Article Google Scholar
Ginalski, K.: Comparative modeling for protein structure prediction. Curr. Opin. Struct. Biol. 16(2), 172–177 (2006)
Article Google Scholar
Flöckner, H., et al.: Progress in fold recognition. Proteins Struct. Funct. Bioinf. 23(3), 376–386 (1995)
Article Google Scholar
Eddy, S.R.: Profile hidden Markov models. Bioinformatics 14(9), 755–763 (1998)
Article Google Scholar
Baker, D., Sali, A.: Protein structure prediction and structural genomics. Science 294(5540), 93–96 (2001)
Article Google Scholar
Šali, A., et al.: Evaluation of comparative protein modeling by MODELLER. Proteins Struct. Funct. Bioinf. 23(3), 318–326 (1995)
Article Google Scholar
Fariselli, P., et al.: The WWWH of remote homolog detection: the state of the art. Briefings Bioinf. 8(2), 78–87 (2007)
Article Google Scholar
Wan, X.-F., Xu, D.: Computational methods for remote homolog identification. Curr. Protein Pept. Sci. 6(6), 527–546 (2005)
Article Google Scholar
Madera, M., Gough, J.: A comparison of profile hidden Markov model procedures for remote homology detection. Nucleic Acids Res. 30(19), 4321–4328 (2002)
Article Google Scholar
Jones, D.T., Taylor, W.R., Thornton, J.M.: The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. CABIOS 8(3), 275–282 (1992)
Google Scholar
Grigoriev, I.V., Kim, S.-H.: Detection of protein fold similarity based on correlation of amino acid properties. Proc. Natl. Acad. Sci. 96(25), 14318–14323 (1999)
Article Google Scholar
Deschavanne, P., Tuffery, P.: Exploring an alignment free approach for protein classification and structural class prediction. Biochimie 90(4), 615–625 (2008)
Article Google Scholar
Jaakkola, T., Diekhans, M., Haussler, D.: A discriminative framework for detecting remote protein homologies. J. Comput. Biol. 7(1–2), 95–114 (2000)
Article Google Scholar
Kuang, R., et al.: Profile-based string kernels for remote homology detection and motif extraction. J. Bioinf. Comput. Biol. 3(03), 527–550 (2005)
Article Google Scholar
Leslie, C.S., Eskin, E., Noble, W.S.: The spectrum kernel: a string kernel for SVM protein classification. In: Pacific Symposium on Biocomputing (2002)
Google Scholar
Liao, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. J. Comput. Biol. 10(6), 857–868 (2003)
Article Google Scholar
Jaakkola, T., Diekhans, M., Haussler, D.: Using the Fisher kernel method to detect remote protein homologies. In: ISMB (1999)
Google Scholar
Leslie, C.S., et al.: Mismatch string kernels for discriminative protein classification. Bioinformatics 20(4), 467–476 (2004)
Article Google Scholar
Byvatov, E., Schneider, G.: Support vector machine applications in bioinformatics. Appl. Bioinf. 2(2), 67–77 (2002)
Google Scholar
Jebara, T.: Machine Learning: Discriminative and Generative. Springer, Berlin (2004)
Google Scholar
Balakrishnan, S., et al.: Learning generative models for protein fold families. Proteins Struct. Funct. Bioinf. 79(4), 1061–1078 (2011)
Article Google Scholar
Thomas, J., Ramakrishnan, N., Bailey-Kellogg, C.: Protein design by sampling an undirected graphical model of residue constraints. IEEE/ACM Trans. Comput. Biol. Bioinf. 6(3), 506–516 (2009)
Article Google Scholar
Shen, H.-B., Chou, K.-C.: Ensemble classifier for protein fold pattern recognition. Bioinformatics 22(14), 1717–1722 (2006)
Article Google Scholar
Tan, A., Gilbert, D., Deville, Y.: Multi-class protein fold classification using a new ensemble machine learning approach (2003)
Google Scholar
Dehzangi, A., Phon-Amnuaisuk, S., Dehzangi, O.: Using random forest for protein fold prediction problem: an empirical study. J. Inf. Sci. Eng. 26(6), 1941–1956 (2010)
Google Scholar
Lundström, J., et al.: Pcons: a neural-network-based consensus predictor that improves fold recognition. Protein Sci. 10(11), 2354–2362 (2001)
Article Google Scholar
McGuffin, L.J., Jones, D.T.: Improvement of the GenTHREADER method for genomic fold recognition. Bioinformatics 19(7), 874–881 (2003)
Article Google Scholar
Zakeri, P., et al.: Protein fold recognition using geometric kernel data fusion. Bioinformatics btu118 (2014)
Google Scholar
Do, C.B., Gross, S.S., Batzoglou, S.: CONTRAlign: discriminative training for protein sequence alignment. In: Research in Computational Molecular Biology. Springer, Berlin (2006)
Google Scholar
Ding, C.H., Dubchak, I.: Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17(4), 349–358 (2001)
Article Google Scholar
Dong, Q., Zhou, S., Guan, J.: A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation. Bioinformatics 25(20), 2655–2662 (2009)
Article Google Scholar
Sharma, A., et al.: A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. J. Theor. Biol. 320, 41–46 (2013)
Article Google Scholar
Smith, T.F., Waterman, M.S.: Comparison of biosequences. Adv. Appl. Math. 2(4), 482–489 (1981)
Article MATH MathSciNet Google Scholar
Pearson, W.R.: Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics 11(3), 635–650 (1991)
Article Google Scholar
Altschul, S.F., et al.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)
Article Google Scholar
Pearson, W.R.: [5] Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 183, 63–98 (1990)
Article Google Scholar
Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. 89(22), 10915–10919 (1992)
Article Google Scholar
Eddy, S.R.: HMMER: profile hidden Markov models for biological sequence analysis (2001)
Google Scholar
Hughey, R., Krogh, A.: Hidden Markov models for sequence analysis: extension and analysis of the basic method. Comput. Appl. Biosci. CABIOS 12(2), 95–107 (1996)
Google Scholar
Morgenstern, B., et al.: DIALIGN: finding local similarities by multiple sequence alignment. Bioinformatics 14(3), 290–294 (1998)
Article Google Scholar
Probst, W.C., et al.: Sequence alignment of the G-protein coupled receptor superfamily. DNA Cell Biol. 11(1), 1–20 (1992)
Article Google Scholar
Söding, J.: Protein homology detection by HMM–HMM comparison. Bioinformatics 21(7), 951–960 (2005)
Article Google Scholar
Tomii, K., Akiyama, Y.: FORTE: a profile–profile comparison tool for protein fold recognition. Bioinformatics 20(4), 594–595 (2004)
Article Google Scholar
Heger, A., Holm, L.: Picasso: generating a covering set of protein family profiles. Bioinformatics 17(3), 272–279 (2001)
Article Google Scholar
Moult, J.: A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Curr. Opin. Struct. Biol. 15(3), 285–289 (2005)
Article Google Scholar
Pruitt, K.D., Tatusova, T., Maglott, D.R.: NCBI reference sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 33(suppl 1), D501–D504 (2005)
Google Scholar
Bates, P.A., et al.: Enhancement of protein modeling by human intervention in applying the automatic programs 3D-JIGSAW and 3D-PSSM. Proteins Struct. Funct. Bioinf. 45(S5), 39–46 (2001)
Article Google Scholar
Koonin, E.V., Wolf, Y.I., Aravind, L.: Protein fold recognition using sequence profiles and its application in structural genomics. Adv. Protein Chem. 54, 245–275 (2000)
Article Google Scholar
Eddy, S.R.: Hidden markov models. Curr. Opin. Struct. Biol. 6(3), 361–365 (1996)
Article Google Scholar
Bateman, A., et al.: The Pfam protein families database. Nucleic Acids Res. 32(suppl 1), D138–D141 (2004)
Article Google Scholar
Bateman, A., et al.: The Pfam protein families database. Nucleic Acids Res. 30(1), 276–280 (2002)
Article Google Scholar
Gough, J., Chothia, C.: SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments. Nucleic Acids Res. 30(1), 268–272 (2002)
Article Google Scholar
Ma, J., et al.: MRFalign: protein homology detection through alignment of Markov random fields. PLoS Comput. Biol. 10(3), e1003500 (2014)
Article Google Scholar
Yona, G., Levitt, M.: Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. J. Mol. Biol. 315(5), 1257–1275 (2002)
Article Google Scholar
Rychlewski, L., Zhang, B., Godzik, A.: Fold and function predictions for fold and function predictions for. Fold Des. 3(4), 229–238 (1998)
Article Google Scholar
Wang, G., Dunbrack, R.L.: Scoring profile-to-profile sequence alignments. Protein Sci. 13(6), 1612–1626 (2004)
Article Google Scholar
Boyd, S., et al.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011)
Article Google Scholar
Daniels, N.M., et al.: SMURFLite: combining simplified Markov random fields with simulated evolution improves remote homology detection for beta-structural proteins into the twilight zone. Bioinformatics 28(9), 1216–1222 (2012)
Article Google Scholar
Daniels, N.M., et al.: MRFy: remote homology detection for beta-structural proteins using Markov random fields and stochastic search. In: Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics. ACM (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Toyota Technological Institute, Chicago, IL, USA
Jinbo Xu, Sheng Wang & Jianzhu Ma

Authors

Jinbo Xu
View author publications
You can also search for this author in PubMed Google Scholar
Sheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jianzhu Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinbo Xu .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Xu, J., Wang, S., Ma, J. (2015). Introduction. In: Protein Homology Detection Through Alignment of Markov Random Fields. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-14914-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-14914-1_1
Published: 23 January 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14913-4
Online ISBN: 978-3-319-14914-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics