Skip to main content

Hidden Markov Models and Protein Secondary Structure Prediction

  • Chapter
  • First Online:
Book cover Bioinformatics and the Cell

Abstract

Hidden Markov model (HMM) is for inferring hidden states of a Markov model based on observed data. For example, intron and exon are hidden states and need to be inferred from the observed nucleotide sequences. Similarly, secondary structural elements such as alpha helices and beta sheets are hidden states and need to be inferred from observed amino acid sequences. The accuracy of HMM in inferring hidden states depends on the transition probability matrix and emission probability matrix derived from training HMM with representative observations. If different states have very different probability to transit into each other, and if the emission probability matrix of the hidden states are highly different from each other, then HMM can be quite accurate. This chapter details the key algorithms used in HMM, such as Viterbi algorithm for reconstructing the hidden states and the forward algorithm to compute the probability of the observed sequence of events. Both Viterbi and forward algorithms are dynamic programming algorithms that we were first exposed to in the chapter on sequence alignment. HMM is applied to reconstructing protein secondary structure as an illustrative example.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Baldi P, Brunak S (2001) Bioinformatics: the machine learning approach. The MIT Press, Cambridge, MA

    Google Scholar 

  • Besemer J, Borodovsky M (2005) GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res 33(Web Server issue):W451–W454

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Borodovsky M, McIninch J (1993) GENMARK: parallel gene recognition for both DNA strands. Comput Chem 17:123–133

    Article  CAS  Google Scholar 

  • Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268:78–94

    Article  PubMed  CAS  Google Scholar 

  • Burge CB, Karlin S (1998) Finding the genes in genomic DNA. Curr Opin Struct Biol 8(3):346–354

    Article  CAS  PubMed  Google Scholar 

  • Chou PY, Fasman GD (1978a) Empirical predictions of protein conformation. Annu Rev Biochem 47:251–276

    Article  CAS  PubMed  Google Scholar 

  • Chou PY, Fasman GD (1978b) Prediction of the secondary structure of proteins from their amino acid sequence. Adv Enzymol Relat Areas Mol Biol 47:45–148

    PubMed  CAS  Google Scholar 

  • Durbin R (1998) Biological sequence analysis : probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Eddy SR (1996) Hidden Markov models. Curr Opin Struct Biol 6(3):361–365

    Article  CAS  PubMed  Google Scholar 

  • Eddy SR (1998) Profile hidden Markov models. Bioinformatics 14(9):755–763

    Article  CAS  PubMed  Google Scholar 

  • Fasman GD, Chou PY (1974) Prediction of protein conformation: consequences and aspirations. In: Blout ER, Bovey FA, Goodman M, Latan N (eds) Peptides, polypeptides and proteins. Wiley, New York, pp 114–125

    Google Scholar 

  • Felsenstein J, Churchill GA (1996) A Hidden Markov Model approach to variation among sites in rate of evolution. Mol Biol Evol 13(1):93–104

    Article  CAS  PubMed  Google Scholar 

  • Higgs PG, Attwood TK (2005) Bioinformatics and molecular evolution. Blackwell, Malden

    Google Scholar 

  • Krogh A, Mian IS, Haussler D (1994) A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Res 22(22):4768–4778

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Moult J, Hubbard T, Fidelis K, Pedersen JT (1999) Critical assessment of methods of protein structure prediction (CASP): round III. Proteins 37(Suppl 3):2–6

    Article  Google Scholar 

  • Pevzner PA (2000) Computational molecular biology: an algorithmic approach. The MIT Press, Cambridge, MA

    Google Scholar 

  • Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286

    Article  Google Scholar 

  • Salzberg SL, Delcher AL, Kasif S, White O (1998) Microbial gene identification using interpolated Markov models. Nucleic Acids Res 26(2):544–548

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Siepel A, Haussler D (2004a) Combining phylogenetic and hidden Markov models in biosequence analysis. J Comput Biol 11(2–3):413–428

    Article  CAS  PubMed  Google Scholar 

  • Siepel A, Haussler D (2004b) Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. Mol Biol Evol 21(3):468–488

    Article  CAS  PubMed  Google Scholar 

  • Siepel A, Haussler D (2005) Phylogenetic hidden Markov models. In: Nielsen R (ed) Statistical methods in molecular evolution. Springer, New York, pp 325–351

    Chapter  Google Scholar 

  • Viterbi AJ (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Inf Theory 13(2):260–269

    Article  Google Scholar 

  • Weir BS (1990) Genetic data analysis. Sinauer Associates, Sunderland

    Google Scholar 

  • Xia X (2001) Data analysis in molecular biology and evolution. Kluwer Academic Publishers, Boston

    Google Scholar 

  • Xia X (2013) DAMBE5: a comprehensive software package for data analysis in molecular biology and evolution. Mol Biol Evol 30:1720–1728

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Xia X (2017d) Self-organizing map for characterizing heterogeneous nucleotide and amino acid sequence motifs. Computation 5(4):43

    Article  Google Scholar 

  • Xia X, Xie Z (2001b) DAMBE: software package for data analysis in molecular biology and evolution. J Hered 92(4):371–373

    Article  CAS  PubMed  Google Scholar 

  • Xia X, Xie Z (2002) Protein structure, neighbor effect, and a new index of amino acid dissimilarities. Mol Biol Evol 19(1):58–67

    Article  CAS  PubMed  Google Scholar 

  • Yang Z (1995) A space-time process model for the evolution of DNA sequences. Genetics 139:993–1005

    PubMed  PubMed Central  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Postscript

Postscript

We see informal applications of HMM in our daily life. By making a telephone call, parents with their ears trained from many years of caring for their children can often detect hidden troubles of their children based only on the voice of the latter. In contrast, people unfamiliar to each other often find it frustratingly difficult to make sense of each other’s behavior, and misunderstanding ensues. The most agonizing moment for me watching the movie “Waterloo Bridge” is when Lady Margaret Cronin failed to detect the distress experienced by Myra.

I once heard a story about the late Stephen Jay Gould giving a talk on evolution to the congregation of an All Souls Church in New York. When the guest and hosts were having lunch together, someone suggested that they should go around the table to introduce themselves. At that point Gould said something that seemed to be extraordinarily rude, something to the effect that he did not really care who the hosts were as he would never see them again. The name of Gould instantly became synonymous to rudeness among the church members.

However, soon after the incident, the members of the church learned from the newspaper that Gould had died of cancer and that his lecture in the church was in fact Gould’s last public engagement – he reserved all the rest of his time to finish his 1464-page magnum opus entitled “The Structure of Evolutionary Theory.” They realized that, at that moment when the seemingly rude remark erupted, Gould must have felt melancholy, as everyone would, knowing that his days were numbered, and that he was merely stating a heartbreaking truth that he would never see anyone around the table again.

In the HMM parlance, the remark by Gould is the emitted event from which the listeners should ideally be able to infer his hidden melancholy state of mind. An inference they did make, but it is wrong. Worse, they did not realize that it was wrong, otherwise they could have prayed more for Gould.

Stephen Jay Gould had spent all his life fighting two kinds of fundamentalists, the religious fundamentalists who believe that God is a micromanager of everything and that the Bible literally encompasses all truths in nature, and the evolutionary fundamentalists who believe that every bit of biodiversity manifests adaptation and results from natural selection. All Souls Church is perhaps the equivalent of Gould in the religious field. I would have expected Gould to have an easy time with members of this very liberal church. Yet misunderstanding still arose, and the misunderstanding could have lasted for a long time if Gould’s death had not been so well publicized.

It is truly enigmatic and paradoxical that, with the advanced computational algorithms helping us to infer the hidden, we still do not seem to make any progress in understanding each other and in understanding ourselves. The ancient Greek sage, Plato, had discovered the root cause of all misunderstanding and evil. It is called arrogance or the illusion that we are better than others. Plato illustrated his point with his famous allegory of the cave.

Imagine prisoners chained inside a cave since childhood, with their heads immobilized in such a way that their eyes were fixed on a gigantic wall. Immediately behind the prisoners was a road along which men, animals, and other things traveled. Behind the road was an enormous fire that projected the shadow of the travelers to the wall that the prisoners were facing. Also, the voice of the travelers was echoed from the wall in such a way that the prisoners believed that the words came from the shadows. Gradually, the prisoners became quite good at identifying the travelers by their shadows and voices. The shadows and the voices, as well as the interpretation of the shadows and voices by the prisoners, constituted the world of reality in the mind of the prisoners.

Now suppose a prisoner was freed and went outside the cave. Gradually he would comprehend a new reality from what he could sense. Once thus enlightened, he naturally would want to return to the cave to convey the new reality to his fellow prisoners. Unfortunately, once back in the cave, he found himself much less able to identify the travelers by their shadows than his fellow prisoners. Being thus perceived as inferior and stupid by his fellow prisoners, he failed completely in communicating the new reality to his fellow prisoners who believed to know better. The fellow prisoners were too arrogant to listen.

It is the arrogance in the mind of the prisoners that prevents them from comprehending the new reality hidden from them. It is the arrogance in the mind of the religious fundamentalists and the evolutionary fundamentalists that prevents them from understanding each other. It is the arrogance in the mind of the presidents and prime ministers that prolongs the misunderstanding among nations. Arrogance is Satan in Christianity.

I have had the privilege of meeting some of the religious and evolutionary fundamentalists. What is particularly ironical is that they all know Plato’s allegory of the cave quite well, but all point to themselves as the enlightened who has seen the real world and sneer at the other party as the chained prisoners with restricted vision.

None of us is omnipresent and eternal, and our view of the world is consequently the same as that of the chained prisoners. Without grasping this painful but basic truth, we will misinterpret what we see or hear, either with HMM or not.

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media LLC

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Xia, X. (2018). Hidden Markov Models and Protein Secondary Structure Prediction. In: Bioinformatics and the Cell. Springer, Cham. https://doi.org/10.1007/978-3-319-90684-3_7

Download citation

Publish with us

Policies and ethics