Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Machine Learning in Computational Biology

  • Cornelia CarageaEmail author
  • Vasant Honavar
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_636


Data mining in bioinformatics; Data mining in computational biology; Data mining in systems biology; Machine learning in bioinformatics; Machine learning in systems biology


Advances in high throughput sequencing and “omics” technologies and the resulting exponential growth in the amount of macromolecular sequence, structure, gene expression measurements, have unleashed a transformation of biology from a data-poor science into an increasingly data-rich science. Despite these advances, biology today, much like physics was before Newton and Leibnitz, has remained a largely descriptive science. Machine learning [6] currently offers some of the most cost-effective tools for building predictive models from biological data, e.g., for annotating new genomic sequences, for predicting macromolecular function, for identifying functionally important sites in proteins, for identifying genetic markers of diseases, and for discovering the networks of genetic interactions that...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Andorf C, Dobbs D, Honavar V. Exploring inconsistencies in genome-wide protein function annotations: a machine learning approach. BMC Bioinform. 2007;8(1):284.CrossRefGoogle Scholar
  2. 2.
    Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. Nat Gene. 2000;25(1):25–9.CrossRefGoogle Scholar
  3. 3.
    Baldi P, Brunak S. Bioinformatics: the machine learning approach. Cambridge, MA: MIT; 2001.zbMATHGoogle Scholar
  4. 4.
    Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. Genbank. Nucleic Acids Res. 2007;35D(Database issue):21–D25.CrossRefGoogle Scholar
  5. 5.
    Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The protein data bank. Nucleic Acids Res. 2000;28(1):235–42.CrossRefGoogle Scholar
  6. 6.
    Bishop CM. Pattern recognition and machine learning. Berlin: Springer; 2006.zbMATHGoogle Scholar
  7. 7.
    Boutell MR, Luo J, Shen X, Brown CM. Learning multi-label scene classification. Pattern Recogn. 2004;37(9):1757–71.CrossRefGoogle Scholar
  8. 8.
    Bruggeman FJ, Westerhoff HV. The nature of systems biology. Trends Microbiol. 2007;15(1):15–50.CrossRefGoogle Scholar
  9. 9.
    Caragea C, Sinapov J, Dobbs D, and Honavar V. Assessing the performance of macromolecular sequence classifiers. In: Proceedings of the IEEE 7th International Symposium on Bioinformatics and Bioengineering; 2007. p. 320–6.Google Scholar
  10. 10.
    de Jong H. Modeling and simulation of genetic regulatory systems: a literature review. J Comput Biol. 2002;9(1):67–103.CrossRefGoogle Scholar
  11. 11.
    Diettrich TG. Ensemble methods in machine learning. Springer, Berlin. In: Proceedings of the 1st International Workshop on Multiple Classifier Systems; 2000. p. 1–15.Google Scholar
  12. 12.
    Diettrich TG. Machine learning for sequential data: a review. In: Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition; 2002. p. 15–30.Google Scholar
  13. 13.
    El-Manzalawy Y, Dobbs D, Honavar V. On evaluating MHC-II binding peptide prediction methods. PLoS One. 2008;3(9):e3268.CrossRefGoogle Scholar
  14. 14.
    El-Manzalawy Y., Dobbs D., Honavar V. Predicting linear B-cell epitopes using string kernels. J Mole Recogn. 2008; 21(4):243–255.CrossRefGoogle Scholar
  15. 15.
    Friedman N, Linial M, Nachman I, Pe’er D. Using bayesian networks to analyze expression data. J Comput Biol. 2000;7(3–4):601–20.CrossRefGoogle Scholar
  16. 16.
    Galperin MY. The molecular biology database collection: 2008 update. Nucleic Acids Res. 2008;36(Database issue):D2–4.CrossRefGoogle Scholar
  17. 17.
    Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res. 2003;3(7–8):1157–82.zbMATHGoogle Scholar
  18. 18.
    Hecker L, Alcon T, Honavar V, Greenlee H. Querying multiple large-scale gene expression datasets from the developing retina using a seed network to prioritize experimental targets. Bioinform Biol Insights. 2008;2:91–102.CrossRefGoogle Scholar
  19. 19.
    Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi A-L. The large-scale organization of metabolic networks. Nature. 1987;407(6804):651–4.CrossRefGoogle Scholar
  20. 20.
    Lahdesmaki H, Shmulevich I, Yli-Harja O. On learning gene regulatory networks under the boolean network model. Mach Learn. 2007;52(1–2):147–67.zbMATHGoogle Scholar
  21. 21.
    Terribilini M, Lee J-H, Yan C, Jernigan RL, Honavar V, Dobbs D. Predicting RNA-binding sites from amino acid sequence. RNA J. 2006;12(8):1450–62.CrossRefGoogle Scholar
  22. 22.
    Yan C, Terribilini M, Wu F, Jernigan RL, Dobbs D, Honavar V. Predicting DNA-binding sites of proteins from amino acid sequence. BMC Bioinform. 2006;7:262.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Computer Science and EngineeringUniversity of North TexasDentonUSA
  2. 2.Iowa State UniversityAmesUSA

Section editors and affiliations

  • Louiqa Raschid
    • 1
  1. 1.Robert H. Smith School of BusinessUniversity of MarylandCollege ParkUSA