Skip to main content

Parameter Training

  • Chapter
  • First Online:
Comparative Gene Finding

Part of the book series: Computational Biology ((COBO,volume 11))

  • 979 Accesses

Abstract

The training of model parameters is one of the most challenging problems when constructing a gene finding algorithm. It involves finding the estimates of the parameters that optimises the performance of the model, based on a set of training sequences. In this chapter we describe some of the techniques most commonly used for this purpose in gene finding algorithms. First we go through the different features commonly included in gene finding algorithms, and discuss the different characteristics they exhibit. Next we put our focus on three main gene characteristics, namely feature length distributions, sequence compositional measures, and splice site detection models. Each section details a number of the most commonly used algorithms for the characteristic in question. In particular, the splice site section includes various Markovian models, neural networks, linear discriminant analysis, Bayesian networks and support vector machines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Antia, H.M.: Numerical Methods for Scientists and Engineers. Birkhäuser, Basel (2002)

    Google Scholar 

  2. Bahl, L., Brown, P., de Souza, P., Mercer, R.: Maximum mutual information estimation of hidden Markov model parameters for speech recognition. Proc. ICASSP-86 1, 49–52 (1986)

    Google Scholar 

  3. Baldi, P., Brunak, S.: Bioinformatics: The Machine Learning Approach. MIT Press, Cambridge (2001)

    MATH  Google Scholar 

  4. Baum, L.E.: An equality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. Inequalities 3, 1–8 (1972)

    Google Scholar 

  5. Chatterji, S., Pachter, L.: Large multiple organism gene finding by collapsed Gibbs sampling. J. Comput. Biol. 12, 599–608 (2005)

    Article  Google Scholar 

  6. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B. 39, 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  7. Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence analysis. Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)

    MATH  Google Scholar 

  8. Eddy, S.R.: Multiple alignment using hidden Markov models. Proc. Int. Conf. Intell. Syst. Mol. Biol. 3, 114–120 (1995)

    Google Scholar 

  9. Eddy, S.R.: Profile hidden Markov models. Bioinformatics 14, 755–763 (1998)

    Article  Google Scholar 

  10. Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Math. Intell. 6, 721–741 (1984)

    Article  MATH  Google Scholar 

  11. Häggström, O.: Finite Markov Chains and Algorithmic Applications. Cambridge University Press, Cambridge (2002)

    MATH  Google Scholar 

  12. Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109 (1970)

    Article  MATH  Google Scholar 

  13. Hughey, R., Krogh, A.: Hidden Markov models for sequence analysis: extension and analysis of the basic method. Comput. Appl. Biosci. 12, 95–108 (1996)

    Google Scholar 

  14. Juang, B.-H., Katagiri, S.: Discriminative learning for minimum error classification. IEEE Trans. Signal Process. 40, 3043–3054 (1992)

    Article  MATH  Google Scholar 

  15. Juang, B.-H., Rabiner, L.R.: Hidden Markov models for speech recognition. Technometrics 33, 251–272 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  16. Juang, B.-H., Chou, W., Lee, C.-H.: Minimum classification error rate methods for speech recognition. IEEE Trans. Speech Audio Process. 5, 257–265 (1997)

    Article  Google Scholar 

  17. Karlin, S., Altschul, S.F.: Methods for assessing the significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA 87, 2264–2268 (1990)

    Article  MATH  Google Scholar 

  18. Kim, J., Pramanik, S., Chung, M.J.: Multiple sequence alignment using simulated annealing. Comput. Appl. Biosci. 10, 419–426 (1994)

    Google Scholar 

  19. Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220, 671–680 (1983)

    Article  MathSciNet  Google Scholar 

  20. Krogh, A.: Two methods for improving the performance of an HMM and their application for gene finding. Proc. Int. Conf. Intell. Syst. Mol. Biol. 5, 179–186 (1997)

    Google Scholar 

  21. Krogh, A., Riis, S.K.: Hidden neural networks. Neural Comput. 11, 541–563 (1999)

    Article  Google Scholar 

  22. Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F., Wootton, J.C.: Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993)

    Article  Google Scholar 

  23. Majoros, W.H.: Methods for Computational Gene Prediction. Cambridge University Press, Cambridge (2007)

    Google Scholar 

  24. Majoros, W.H., Salzberg, S.L.: An empirical analysis of training protocols for probabilistic gene finders. BMC Bioinf. 5, 206 (2004)

    Article  Google Scholar 

  25. McLachlan, G., Krishnan, T.: The EM Algorithm and Extensions. Wiley, New York (1996)

    Google Scholar 

  26. Meng, X.L., Rubin, D.B.: Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80, 267–278 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  27. Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equations of state calculations by fast computing machines. J. Chem. Phys. 21, 1087–1092 (1953)

    Article  Google Scholar 

  28. Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 257–286 (1989)

    Article  Google Scholar 

  29. Reichl, W., Ruske, G.: Discriminative training for continuous speech recognition. Eurospeech-95 1, 537–540 (1995)

    Google Scholar 

  30. Rojas, R.: Neural Networks: a Systematic Introduction. Springer, New York (1996)

    Google Scholar 

  31. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. In: Rumelhart, D.E., McClelland, R.J. (eds.) Parallell Distributed Processing, vol. 1, pp. 318–362. MIT Press, Cambridge (1986)

    Google Scholar 

  32. Tatusov, R.L., Altschul, S.F., Koonin, E.V.: Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. Proc. Natl. Acad. Sci. USA 91, 12091–12095 (1994)

    Article  Google Scholar 

  33. Wu, C.F.J.: On the convergence properties of the EM algorithm. Ann. Stat. 11, 95–103 (1983)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marina Axelson-Fisk .

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag London

About this chapter

Cite this chapter

Axelson-Fisk, M. (2010). Parameter Training. In: Comparative Gene Finding. Computational Biology, vol 11. Springer, London. https://doi.org/10.1007/978-1-84996-104-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-84996-104-2_6

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84996-103-5

  • Online ISBN: 978-1-84996-104-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics