Skip to main content

HMMs in Protein Fold Classification

  • Protocol
  • First Online:
Book cover Hidden Markov Models

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1552))

Abstract

The limitation of most HMMs is their inherent high dimensionality. Therefore we developed several variations of low complexity models that can be applied even to protein families with a few members. In this chapter we present these variations. All of them include the use of a hidden Markov model (HMM), with a small number of states (called reduced state-space HMM), which is trained with both amino acid sequence and secondary structure of proteins whose 3D structure is known and it is used for protein fold classification. We used data from Protein Data Bank and annotation from SCOP database for training and evaluation of the proposed HMM variations for a number of protein folds that belong to major structural classes. Results indicate that the variations have similar performance, or even better in some cases, on classifying proteins than SAM, which is a widely used HMM-based method for protein classification. The major advantage of the proposed variations is that we employed a small number of states and the algorithms used for training and scoring are of low complexity and thus relatively fast. The main variations examined include a version of the reduced state-space HMM with seven states (7-HMM), a version of the reduced state-space HMM with three states (3-HMM) and an optimized version of the reduced state-space HMM with three states, where an optimization process is applied to its scores (optimized 3-HMM).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Whitford D (2005) Proteins: structure and function. John Wiley & Sons, NJ, USA

    Google Scholar 

  2. Lee SY, Lee JY, Jung KS, Ryu KH (2009) A 9-state hidden Markov model using protein secondary structure information for protein fold recognition. Comp Biol Med 39(6):527–534

    Article  CAS  Google Scholar 

  3. Camproux A, Guyon F, Gautier R, Laffray J, Tuffery P (2005) A hidden Markov model applied to the analysis of protein 3D-structures. in: Proc. int. symp. applied stochastic models and data analysis

    Google Scholar 

  4. Orengo CA, Jones DT, Thornton JM (2003) Bioinformatics: genes, proteins and computers. Bios Scientific Pub. Ltd, Oxford

    Google Scholar 

  5. Zhang Y, Skolnick J (2005) The protein structure prediction problem could be solved using the current PDB library. Proc Natl Acad Sci U S A 102(4):1029–1034

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Hargbo J, Elofsson A (1999) Hidden Markov models that use predicted secondary structures for fold recognition. Proteins 36(1):68–76

    Article  CAS  PubMed  Google Scholar 

  7. Lampros C, Simos T, Exarchos TP, Exarchos KP, Papaloukas C, Fotiadis DI (2014) Assessment of optimized Markov models in protein fold classification. J Bioinform Comput Biol 12(4):1450016

    Article  PubMed  Google Scholar 

  8. Murzin AG (1999) Structure classification based assessment of CASP3 predictions for the fold recognition targets. Proteins (Suppl 3):88–103

    Google Scholar 

  9. Orengo CA, Bray JE, Hubbard T, LoConte L, Sillitoe I (1999) Analysis and assessment of ab initio three-dimensional prediction, secondary structure, and contacts prediction. Proteins 37:149–170

    Article  Google Scholar 

  10. Zhang Y (2008) Progress and challenges in protein structure prediction. Curr Opin Struct Biol 18(3):342–348

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Zhou Y, Duan Y, Yang Y, Farragi E, Lei H (2011) Trends in template/fragment-free protein structure prediction. Theor Chem Acc 128(1):3–16

    Article  CAS  PubMed  Google Scholar 

  12. Maurice KJ et al (2014) SSThread: template-free protein structure prediction by threading pairs of contacting secondary structures followed by assembly of overlapping pairs. J Comput Chem 35(8):644–656

    Article  CAS  PubMed  Google Scholar 

  13. Bowie JU, Luthy R, Eisenberg D (1991) A method to identify protein sequence that fold into a known three-dimensional structure. Science 253:164–170

    Article  CAS  PubMed  Google Scholar 

  14. Flockner H, Domingues F, Sippl MJ (1997) Proteins folds from pair interactions: a blind test into fold recognition. Proteins 1:129–133

    Article  PubMed  Google Scholar 

  15. Xu J (2005) Fold recognition by predicted alignment accuracy. IEEE/ACM Trans Comput Biol Bioinform 2(2):157–165

    Article  CAS  PubMed  Google Scholar 

  16. Sander O, Sommer I, Lengauer T (2006) Local protein structure prediction using discriminative models. BMC Bioinformatics (7):14

    Google Scholar 

  17. Hu Y, Dong X, Wu A, Cao Y, Tian L, Jiang T (2011) Incorporation of local structural preference potential improves fold recognition. PLoS One 6(2):e17215

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Mahajan S, De Brevern AG, Sanejouand YH, Srinivasan N, Offmann B (2015) Use of a structural alphabet to find compatible folds for amino acid sequences. Protein Sci 24(1):145–153

    Article  CAS  PubMed  Google Scholar 

  19. Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39(Suppl 2):W29–W37

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Karplus K, Karchin R, Shackelford G, Hughey R (2005) Calibrating E-values for hidden Markov models using reverse-sequence null models. Bioinformatics 21:4107–4115

    Article  CAS  PubMed  Google Scholar 

  21. Karchin R, Cline M, Mandel-Gutfreund Y, Karplus K (2003) Hidden Markov models that use predicted local structure for fold recognition: alphabets of backbone geometry. Proteins 51:504–514

    Article  CAS  PubMed  Google Scholar 

  22. Dandekar T, Argos P (1996) Identifying the tertiary fold of small proteins with diferent topologies from sequence and secondary structure using the genetic algorithm and extended criteria specific for strand regions. J Mol Biol 256:645–660

    Article  CAS  PubMed  Google Scholar 

  23. Zangoei MH, Jalili S (2013) Protein fold recognition with a two-layer method based on SVM–SA, WP–NN and C4. 5 (TLM–SNC). Int J Data Mining Bioinform 8(2):203–223

    Article  Google Scholar 

  24. Deschavanne P, Tuffery P (2009) Enhanced protein fold recognition using a structural alphabet. Proteins 76:129–137

    Article  CAS  PubMed  Google Scholar 

  25. Chmielnicki W, Stapor K (2012) A hybrid discriminative/generative approach to protein fold recognition. Neurocomputing 75(1):194–198

    Article  Google Scholar 

  26. Exarchos TP, Papaloukas C, Lampros C, Fotiadis DI (2008) Mining sequential patterns for protein fold recognition. J Biomed Inform 41(1):165–179

    Article  CAS  PubMed  Google Scholar 

  27. Tsai CY, Chen CJ, (2015) A PSOAB classifier for solving sequence classification problems. Appl Soft Comput 27(C):11–27

    Google Scholar 

  28. Valavanis I, Spyrou G, Nikita K (2010) A similarity network approach for the analysis and comparison of protein sequence/structure sets. J Biomed Inform 43(2):257–267

    Article  CAS  PubMed  Google Scholar 

  29. Abbasi E, Mehdi G, Shiri ME (2013) FRAN and RBF-PSO as two components of a hyper framework to recognize protein folds. Comput Biol Med 43(9):1182–1191

    Article  CAS  PubMed  Google Scholar 

  30. Lampros C, Papaloukas C, Exarchos TP, Goletsis Y, Fotiadis DI (2007) Sequence-based protein structure prediction using a reduced state-space hidden Markov model. Comput Biol Med 37:1211–1224

    Article  CAS  PubMed  Google Scholar 

  31. Durbin R, Eddy S, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, New York

    Book  Google Scholar 

  32. Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22:2577–2637

    Article  CAS  PubMed  Google Scholar 

  33. Baum LE (1972) An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. Inequalities 3:1–8

    Google Scholar 

  34. Lampros C, Papaloukas C, Exarchos K, Fotiadis DI, Tsalikakis D (2009) Improving the protein fold recognition accuracy of a reduced state-space hidden Markov model. Comput Biol Med 39:907–914

    Article  CAS  PubMed  Google Scholar 

  35. Lagarias JC, Reeds JA, Wright MH, Wright PE (1998) Convergence properties of the Nelder-Mead simplex method in low dimensions. SIAM J Optim 9(1):112–147

    Article  Google Scholar 

  36. Chandonia JM, Hon G, Walker NS, Lo Conte L, Koehl P, Levitt M, Brenner SE (2004) The ASTRAL compendium in 2004. Nucleic Acids Res 32(Database issue):D189–D192

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28:235–242

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Andreeva A, Howorth D, Brenner SE, Hubbard TJ, Chothia C, Murzin AG (2004) SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res 32(Database issue):D226–D229

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Machado-Lima A, Kashiwabara AY, Durham AM (2010) Decreasing the number of false positives in sequence classification. BMC Genomics 22(11 Suppl 5):S10

    Article  Google Scholar 

  40. Jones DT (1999) Protein secondary structure prediction based on position specific scoring matrices. J Mol Biol 292:195–202

    Article  CAS  PubMed  Google Scholar 

  41. Lin HN, Sung TY, Ho SY, Hsu WL (2010) Improving protein secondary structure prediction based on short subsequences with local structure similarity. BMC Genomics 2(Suppl 4):S4

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dimitrios I. Fotiadis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media LLC

About this protocol

Cite this protocol

Lampros, C., Papaloukas, C., Exarchos, T., Fotiadis, D.I. (2017). HMMs in Protein Fold Classification. In: Westhead, D., Vijayabaskar, M. (eds) Hidden Markov Models. Methods in Molecular Biology, vol 1552. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6753-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-6753-7_2

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-6751-3

  • Online ISBN: 978-1-4939-6753-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics