Skip to main content

Analysis of Biological Sequences

  • Chapter
Markov Models for Pattern Recognition

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

  • 4561 Accesses

Abstract

In the field of biological sequence analysis there still seem to exist strong reservations against the application of techniques of statistical pattern recognition such as HMMs. These reservations are most probably due to the fact that research in the field of biological sequence analysis have the strong desire to be able to explain every detail of a model from a biological viewpoint. Therefore, in several publications on HMM-based biological sequence analysis sections can be found in which it is explicitly pointed out that HMMs principally achieve the same results as the more traditional methods. Usually only the underlying mathematical theory is given as their main advantage and less frequently the automatic trainability of the model parameters required. In this chapter we will first briefly present the two most important software tools that were developed for the analysis of biological sequences by means of hidden Markov models, namely HMMER and SAM. In the final section of the chapter we will present a system for the classification of proteins which was developed on the basis of ESMERALDA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 89.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The original HMMER 2.1.1 documentation, which today is only partially available via archive.org, says that ‘It’s “hammer”: as in, a more precise tool than a BLAST. :)’.

  2. 2.

    The name “Plan 7” for the structure improved with respect to the older “Plan 9” model architecture was an allusion to the title of a science fiction movie—sometimes referred to as the worst movie ever made. Unfortunately, this humorous remark can no longer be found in the HMMER documentation.

  3. 3.

    The profile HMM structure with match, insert, and delete states is incorrectly referred to as a “linear HMM” in the SAM documentation.

References

  1. Bairoch, A., Apweiler, R.: The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Research 28(1), 45–48 (2000)

    Article  Google Scholar 

  2. Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)

    Book  Google Scholar 

  3. Eddy, S.R.: Profile Hidden Markov Models. Bioinformatics 14(9), 755–763 (1998)

    Article  Google Scholar 

  4. Finn, R.D., Clements, J., Eddy, S.R.: HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 39(suppl 2), 29–37 (2011)

    Article  Google Scholar 

  5. Howard Hughes Medical Institute: HMMER: biological sequence analysis using profile hidden Markov models (2013). http://hmmer.janelia.org/

  6. Hughey, R., Karplus, K., Krogh, A.: SAM: Sequence alignment and modeling software system (2005). http://www.cse.ucsc.edu/research/compbio/sam.html

  7. Plötz, T., Fink, G.A.: Feature extraction for improved profile HMM based biological sequence analysis. In: Proc. Int. Conf. on Pattern Recognition, pp. 315–318 (2004)

    Google Scholar 

  8. Plötz, T., Fink, G.A.: A new approach for HMM based protein sequence modeling and its application to remote homology classification. In: Proc. Workshop Statistical Signal Processing, Bordeaux, France (2005)

    Google Scholar 

  9. Plötz, T., Fink, G.A.: Robust remote homology detection by feature based Profile Hidden Markov Models. Stat. Appl. Genet. Mol. Biol. 4(1) (2005)

    Google Scholar 

  10. Plötz, T., Fink, G.A.: Pattern recognition methods for advanced stochastic protein sequence analysis using HMMs. Pattern Recognit. 39, 2267–2280 (2006). Special Issue on Bioinformatics

    Article  MATH  Google Scholar 

  11. Punta, M., Coggill, P.C., Eberhardt, R.Y., Mistry, J., Tate, J., Boursnell, C., Pang, N., Forslund, K.f., Ceric, G., Clements, J., Heger, A., Holm, L., Sonnhammer, E.L.L., Eddy, S.R., Bateman, A., Finn, R.D.: The Pfam protein families database. Nucleic Acids Res. 40(D1), 290–301 (2012). http://nar.oxfordjournals.org/content/40/D1/D290.full.pdf+html

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag London

About this chapter

Cite this chapter

Fink, G.A. (2014). Analysis of Biological Sequences. In: Markov Models for Pattern Recognition. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-6308-4_15

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-6308-4_15

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-6307-7

  • Online ISBN: 978-1-4471-6308-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics