Advertisement

Journal of Statistical Physics

, Volume 142, Issue 6, pp 1187–1205 | Cite as

Statistical Mechanics of Transcription-Factor Binding Site Discovery Using Hidden Markov Models

  • Pankaj Mehta
  • David J. Schwab
  • Anirvan M. Sengupta
Article

Abstract

Hidden Markov Models (HMMs) are a commonly used tool for inference of transcription factor (TF) binding sites from DNA sequence data. We exploit the mathematical equivalence between HMMs for TF binding and the “inverse” statistical mechanics of hard rods in a one-dimensional disordered potential to investigate learning in HMMs. We derive analytic expressions for the Fisher information, a commonly employed measure of confidence in learned parameters, in the biologically relevant limit where the density of binding sites is low. We then use techniques from statistical mechanics to derive a scaling principle relating the specificity (binding energy) of a TF to the minimum amount of training data necessary to learn it.

Keywords

Bioinformatics Hidden Markov Models One-dimensional statistical mechanics Fisher information Machine learning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Berg, O.G., von Hippel, P.: Trends Biochem. Sci. 13, 207 (1988) CrossRefGoogle Scholar
  2. 2.
    Stormo, G., Fields, D.: Trends Biochem. Sci. 23, 109 (1998) CrossRefGoogle Scholar
  3. 3.
    Djordjevic, M., Sengupta, A.M., Shraiman, B.I.: Genome Res. 13, 2381 (2003) CrossRefGoogle Scholar
  4. 4.
    Rajewsky, N., Vergassola, M., Gaul, U., Siggia, E.: BMC Bioinform. 3 (2002) Google Scholar
  5. 5.
    Sinha, S., van Nimwegen, E., Siggia, E.D.: Bioinformatics 19, 292 (2003) CrossRefGoogle Scholar
  6. 6.
    Drawid, A., Gupta, N., Nagaraj, V., Gelinas, C., Sengupta, A.: BMC Bioinform. 10, 208 (2009) CrossRefGoogle Scholar
  7. 7.
    Kinney, J.B., Tkaik, G., Callan, C.G.: Proc. Natl. Acad. Sci. USA 104, 501 (2007) CrossRefADSGoogle Scholar
  8. 8.
    Percus, J.: J. Stat. Phys. 15 (1976) Google Scholar
  9. 9.
    Bishop, C.: In: Pattern Recognition and Machine Learning (2006) Google Scholar
  10. 10.
    Rabiner, L.: Proc. IEEE 257 (1989) Google Scholar
  11. 11.
    Schwab, D.J., Bruinsma, R., Rudnick, J., Widom, J.: Phys. Rev. Lett. 100, 228105 (2008) CrossRefADSGoogle Scholar
  12. 12.
    Morozov, A., Fortney, K., Gaykalova, D.A., Studitsky, V., Widom, J., Siggia, E.: arXiv:0805.4017 (2008)
  13. 13.
    Baum, L.E., Petrie, T., Soules, G., Weiss, N.: Ann. Math. Stat. 41, 164 (1970) CrossRefMATHMathSciNetGoogle Scholar
  14. 14.
    Olsen, R., Bundschuh, R., Hwa, T.: In: Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology, p. 211 (1999) Google Scholar
  15. 15.
    Tanay, A., Siggia, E.: Genome Biol. 9, 37 (2008) CrossRefGoogle Scholar
  16. 16.
    Jeffreys, H.: Proc. R. Soc. Lond. Ser. A, Math. Phys. Sci. 186, 453 (1946) CrossRefMATHADSMathSciNetGoogle Scholar
  17. 17.
    Mahalanobis, P.: Proc. Natl. Inst. Sci. India 2, 49–55 (1936) MATHGoogle Scholar
  18. 18.
    Mora, T., Walczak, A., Bialek, W., Callan, C.G.: Proc. Natl. Acad. Sci. USA 107, 5405 (2010) CrossRefADSGoogle Scholar
  19. 19.
    Schneidman, E., Berry, M., Segev, R., Bialek, W.: Nature 440, 1007 (2006) CrossRefADSGoogle Scholar
  20. 20.
    Halabi, N., Rivoire, O., Leibler, S., Ranganathan, R.: Cell 138, 774 (2009) CrossRefGoogle Scholar
  21. 21.
    Weigt, M., White, R., Szurmant, H., Hoch, J., Hwa, T.: Proc. Natl. Acad. Sci. USA 106, 67 (2009) CrossRefADSGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Pankaj Mehta
    • 1
  • David J. Schwab
    • 2
  • Anirvan M. Sengupta
    • 3
  1. 1.Dept. of PhysicsBoston UniversityBostonUSA
  2. 2.Dept. of Molecular Biology and Lewis-Sigler InstitutePrinceton UniversityPrincetonUSA
  3. 3.BioMAPS and Dept. of PhysicsRutgers UniversityPiscatawayUSA

Personalised recommendations