Hidden Markov Models and Protein Secondary Structure Prediction

Xia, Xuhua

doi:10.1007/978-3-319-90684-3_7

Xuhua Xia²

2579 Accesses
1 Citations

Abstract

Hidden Markov model (HMM) is for inferring hidden states of a Markov model based on observed data. For example, intron and exon are hidden states and need to be inferred from the observed nucleotide sequences. Similarly, secondary structural elements such as alpha helices and beta sheets are hidden states and need to be inferred from observed amino acid sequences. The accuracy of HMM in inferring hidden states depends on the transition probability matrix and emission probability matrix derived from training HMM with representative observations. If different states have very different probability to transit into each other, and if the emission probability matrix of the hidden states are highly different from each other, then HMM can be quite accurate. This chapter details the key algorithms used in HMM, such as Viterbi algorithm for reconstructing the hidden states and the forward algorithm to compute the probability of the observed sequence of events. Both Viterbi and forward algorithms are dynamic programming algorithms that we were first exposed to in the chapter on sequence alignment. HMM is applied to reconstructing protein secondary structure as an illustrative example.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Baldi P, Brunak S (2001) Bioinformatics: the machine learning approach. The MIT Press, Cambridge, MA
Google Scholar
Besemer J, Borodovsky M (2005) GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res 33(Web Server issue):W451–W454
Article CAS PubMed PubMed Central Google Scholar
Borodovsky M, McIninch J (1993) GENMARK: parallel gene recognition for both DNA strands. Comput Chem 17:123–133
Article CAS Google Scholar
Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268:78–94
Article PubMed CAS Google Scholar
Burge CB, Karlin S (1998) Finding the genes in genomic DNA. Curr Opin Struct Biol 8(3):346–354
Article CAS PubMed Google Scholar
Chou PY, Fasman GD (1978a) Empirical predictions of protein conformation. Annu Rev Biochem 47:251–276
Article CAS PubMed Google Scholar
Chou PY, Fasman GD (1978b) Prediction of the secondary structure of proteins from their amino acid sequence. Adv Enzymol Relat Areas Mol Biol 47:45–148
PubMed CAS Google Scholar
Durbin R (1998) Biological sequence analysis : probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge
Book Google Scholar
Eddy SR (1996) Hidden Markov models. Curr Opin Struct Biol 6(3):361–365
Article CAS PubMed Google Scholar
Eddy SR (1998) Profile hidden Markov models. Bioinformatics 14(9):755–763
Article CAS PubMed Google Scholar
Fasman GD, Chou PY (1974) Prediction of protein conformation: consequences and aspirations. In: Blout ER, Bovey FA, Goodman M, Latan N (eds) Peptides, polypeptides and proteins. Wiley, New York, pp 114–125
Google Scholar
Felsenstein J, Churchill GA (1996) A Hidden Markov Model approach to variation among sites in rate of evolution. Mol Biol Evol 13(1):93–104
Article CAS PubMed Google Scholar
Higgs PG, Attwood TK (2005) Bioinformatics and molecular evolution. Blackwell, Malden
Google Scholar
Krogh A, Mian IS, Haussler D (1994) A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Res 22(22):4768–4778
Article CAS PubMed PubMed Central Google Scholar
Moult J, Hubbard T, Fidelis K, Pedersen JT (1999) Critical assessment of methods of protein structure prediction (CASP): round III. Proteins 37(Suppl 3):2–6
Article Google Scholar
Pevzner PA (2000) Computational molecular biology: an algorithmic approach. The MIT Press, Cambridge, MA
Google Scholar
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286
Article Google Scholar
Salzberg SL, Delcher AL, Kasif S, White O (1998) Microbial gene identification using interpolated Markov models. Nucleic Acids Res 26(2):544–548
Article PubMed PubMed Central CAS Google Scholar
Siepel A, Haussler D (2004a) Combining phylogenetic and hidden Markov models in biosequence analysis. J Comput Biol 11(2–3):413–428
Article CAS PubMed Google Scholar
Siepel A, Haussler D (2004b) Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. Mol Biol Evol 21(3):468–488
Article CAS PubMed Google Scholar
Siepel A, Haussler D (2005) Phylogenetic hidden Markov models. In: Nielsen R (ed) Statistical methods in molecular evolution. Springer, New York, pp 325–351
Chapter Google Scholar
Viterbi AJ (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Inf Theory 13(2):260–269
Article Google Scholar
Weir BS (1990) Genetic data analysis. Sinauer Associates, Sunderland
Google Scholar
Xia X (2001) Data analysis in molecular biology and evolution. Kluwer Academic Publishers, Boston
Google Scholar
Xia X (2013) DAMBE5: a comprehensive software package for data analysis in molecular biology and evolution. Mol Biol Evol 30:1720–1728
Article PubMed PubMed Central CAS Google Scholar
Xia X (2017d) Self-organizing map for characterizing heterogeneous nucleotide and amino acid sequence motifs. Computation 5(4):43
Article Google Scholar
Xia X, Xie Z (2001b) DAMBE: software package for data analysis in molecular biology and evolution. J Hered 92(4):371–373
Article CAS PubMed Google Scholar
Xia X, Xie Z (2002) Protein structure, neighbor effect, and a new index of amino acid dissimilarities. Mol Biol Evol 19(1):58–67
Article CAS PubMed Google Scholar
Yang Z (1995) A space-time process model for the evolution of DNA sequences. Genetics 139:993–1005
PubMed PubMed Central CAS Google Scholar

Download references

Author information

Authors and Affiliations

University of Ottawa CAREG and Biology Department, Ottawa, ON, Canada
Xuhua Xia

Authors

Xuhua Xia
View author publications
You can also search for this author in PubMed Google Scholar

Postscript

We see informal applications of HMM in our daily life. By making a telephone call, parents with their ears trained from many years of caring for their children can often detect hidden troubles of their children based only on the voice of the latter. In contrast, people unfamiliar to each other often find it frustratingly difficult to make sense of each other’s behavior, and misunderstanding ensues. The most agonizing moment for me watching the movie “Waterloo Bridge” is when Lady Margaret Cronin failed to detect the distress experienced by Myra.

I once heard a story about the late Stephen Jay Gould giving a talk on evolution to the congregation of an All Souls Church in New York. When the guest and hosts were having lunch together, someone suggested that they should go around the table to introduce themselves. At that point Gould said something that seemed to be extraordinarily rude, something to the effect that he did not really care who the hosts were as he would never see them again. The name of Gould instantly became synonymous to rudeness among the church members.

However, soon after the incident, the members of the church learned from the newspaper that Gould had died of cancer and that his lecture in the church was in fact Gould’s last public engagement – he reserved all the rest of his time to finish his 1464-page magnum opus entitled “The Structure of Evolutionary Theory.” They realized that, at that moment when the seemingly rude remark erupted, Gould must have felt melancholy, as everyone would, knowing that his days were numbered, and that he was merely stating a heartbreaking truth that he would never see anyone around the table again.

In the HMM parlance, the remark by Gould is the emitted event from which the listeners should ideally be able to infer his hidden melancholy state of mind. An inference they did make, but it is wrong. Worse, they did not realize that it was wrong, otherwise they could have prayed more for Gould.

Stephen Jay Gould had spent all his life fighting two kinds of fundamentalists, the religious fundamentalists who believe that God is a micromanager of everything and that the Bible literally encompasses all truths in nature, and the evolutionary fundamentalists who believe that every bit of biodiversity manifests adaptation and results from natural selection. All Souls Church is perhaps the equivalent of Gould in the religious field. I would have expected Gould to have an easy time with members of this very liberal church. Yet misunderstanding still arose, and the misunderstanding could have lasted for a long time if Gould’s death had not been so well publicized.

It is truly enigmatic and paradoxical that, with the advanced computational algorithms helping us to infer the hidden, we still do not seem to make any progress in understanding each other and in understanding ourselves. The ancient Greek sage, Plato, had discovered the root cause of all misunderstanding and evil. It is called arrogance or the illusion that we are better than others. Plato illustrated his point with his famous allegory of the cave.

Imagine prisoners chained inside a cave since childhood, with their heads immobilized in such a way that their eyes were fixed on a gigantic wall. Immediately behind the prisoners was a road along which men, animals, and other things traveled. Behind the road was an enormous fire that projected the shadow of the travelers to the wall that the prisoners were facing. Also, the voice of the travelers was echoed from the wall in such a way that the prisoners believed that the words came from the shadows. Gradually, the prisoners became quite good at identifying the travelers by their shadows and voices. The shadows and the voices, as well as the interpretation of the shadows and voices by the prisoners, constituted the world of reality in the mind of the prisoners.

Now suppose a prisoner was freed and went outside the cave. Gradually he would comprehend a new reality from what he could sense. Once thus enlightened, he naturally would want to return to the cave to convey the new reality to his fellow prisoners. Unfortunately, once back in the cave, he found himself much less able to identify the travelers by their shadows than his fellow prisoners. Being thus perceived as inferior and stupid by his fellow prisoners, he failed completely in communicating the new reality to his fellow prisoners who believed to know better. The fellow prisoners were too arrogant to listen.

It is the arrogance in the mind of the prisoners that prevents them from comprehending the new reality hidden from them. It is the arrogance in the mind of the religious fundamentalists and the evolutionary fundamentalists that prevents them from understanding each other. It is the arrogance in the mind of the presidents and prime ministers that prolongs the misunderstanding among nations. Arrogance is Satan in Christianity.

I have had the privilege of meeting some of the religious and evolutionary fundamentalists. What is particularly ironical is that they all know Plato’s allegory of the cave quite well, but all point to themselves as the enlightened who has seen the real world and sneer at the other party as the chained prisoners with restricted vision.

None of us is omnipresent and eternal, and our view of the world is consequently the same as that of the chained prisoners. Without grasping this painful but basic truth, we will misinterpret what we see or hear, either with HMM or not.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Xia, X. (2018). Hidden Markov Models and Protein Secondary Structure Prediction. In: Bioinformatics and the Cell. Springer, Cham. https://doi.org/10.1007/978-3-319-90684-3_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-90684-3_7
Published: 06 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-90682-9
Online ISBN: 978-3-319-90684-3
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Postscript

Postscript

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation