Skip to main content

An Automated ILP Server in the Field of Bioinformatics

  • Conference paper
  • First Online:
Book cover Inductive Logic Programming (ILP 2001)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2157))

Included in the following conference series:

  • 322 Accesses

Abstract

The identification of evolutionary related (homologous) proteins is a key problem in molecular biology. Here we present a inductive logic programming based method, Homology Induction (HI), which acts as a filter for existing sequence similarity searches to improve their performance in the detection of remote protein homologies. HI performs a PSI-BLAST search to generate positive, negative, and uncertain examples, and collects descriptions of these examples. It then learns rules to discriminate the positive and negative examples. The rules are used to filter the uncertain examples in the “twilight zone”. HI uses a multitable database of 51,430,710 pre-fabricated facts from a variety of biological sources, and the inductive logic programming system Aleph to induce rules. Hi was tested on an independent set of protein sequences with equal or less than 40 per cent sequence similarity (PDB40D). ROC analysis is performed showing that HI can significantly improve existing similarity searches. The method is automated and can be used via a web/mail interface.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. F. Altschul, W. Gish, W. Miller, Eugene W. Myers, and D. J. Lipman. Basic local alignment search tool. J. Mol. Biol., 215:403–410, 1990.

    Google Scholar 

  2. S. F. Altschul, T L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research, 25(17):3389–3402, 1997.

    Article  Google Scholar 

  3. A. Bairoch and R. Apweiler. The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 2000. Nucleic Acids Research, 28:45–48, 2000.

    Article  Google Scholar 

  4. A. P. Bradley. The use of area under ROC curve in the evaluation of learning algorithms. Pattern Recognition, 30(7):1145–1159, 1995.

    Article  Google Scholar 

  5. L. Breiman. Bagging predictors. Machine Learning, 26(2):123–140, 1996.

    Google Scholar 

  6. L. Dehaspe. Frequent Pattern Discovery in First-Order Logic. PhD thesis, Department of Computer Science, Katholieke Universiteit Leuven, Belgium, 1998.

    Google Scholar 

  7. S. Dzeroski. Inductive logic programming and knowledge discovery. In U. M. Fayyad, G. Piatetsky-Sharpiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 117–152. AAAI/MIT Press, 1996.

    Google Scholar 

  8. J. P. Egan. Signal Detection Theory and ROC Analysis. Cognition and Perception. Academic Press, New York, 1975.

    Google Scholar 

  9. D. Eisenberg. Three-dimensional structure of membrane and surface proteins. Ann. Rev. Biochem, 53:595–623, 1984.

    Article  Google Scholar 

  10. Y. Freud and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997.

    Article  MathSciNet  Google Scholar 

  11. U. Hobohm and C. Sander. A sequence property approach to searching protein database. J. Mol. Biol., 251:390–399, 1995.

    Article  Google Scholar 

  12. T. Jaakola, M. Diekhans, and D. Haussler. Using Fisher kernel method to detect remote protein homologies. In Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology, pages 149–158. AAAI, AAAI Press, 1999.

    Google Scholar 

  13. K. Karplus, C. Barrett, and R. Hughey. Hidden markov models for detecting remote protein homologies. Bioinformatics, 14(10):846–856, 1998. SAM-T98 paper.

    Article  Google Scholar 

  14. R. D. King, S. Muggleton, A. Srinivasan, and M. J. E. Sterberg. Structure-activity relationships derived by machine learning: The use of atoms and their bond connectivities to predict mutagenicity by inductive logic programming. Proc. Natl. Acad. Sci. USA, 93:438–442, 1996.

    Article  Google Scholar 

  15. Ross D. King and Ashwin Srinivasan. The discovery of indicator variables for qsar unsing inductive logic programming. Journal of Compter-Aided Molecular Design, 11:571–580, 1997.

    Article  Google Scholar 

  16. E. R. Kirk. Statistics: An Introduction. Hardcourt Brace College, USA, fourth edition, 1999.

    Google Scholar 

  17. N. Lavrac and S. Dzeroski. Inductive Logic Programming: Techniques and Applications. Ellis Horwood, 1994.

    Google Scholar 

  18. D. J. Lipman. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 25:3389–3402, 1997.

    Article  Google Scholar 

  19. D. J. Lipman and W. R. Pearson. Rapid and sensitive protein similarity searches. Science, 277:1435–1441, March 1985.

    Google Scholar 

  20. R. M. MacCallum, L. A. Kelley, and M. J. E. Sternberg. SAWTED: Structure Assignment With TExt Description-enhanced detection of remote homologues with automated SWISS-PROT annotation comparision. Bioinformatics, 16(2):125–129, 2000.

    Article  Google Scholar 

  21. Stephen Muggleton. Inductive logic programming. New Generation Computing, 8(4):295–318, 1990.

    Article  Google Scholar 

  22. Stephen Muggleton. Inverse entailment and progol. New Generation Computing Journal, 13:245–286, 1995.

    Google Scholar 

  23. A. G. Murzin, S. E. Brenner, T. Hubbard, and C. Chothia. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 247:536–540, 1995.

    Article  Google Scholar 

  24. S. B. Needleman and C. D. Wunsch. A general method applicable to the research for similarities in the amino acid sequencesof two proteins. J. Mol. Biol., 48:443–453, 1970.

    Article  Google Scholar 

  25. H. Nielsen, J. Engelbrecht, S. Brunack, and G. von Heijne. Identification of prokaryotic and eukariotic signal peptides and prediction of their cleavage sites. Protein Engineering, 10:1–6, 1997.

    Article  Google Scholar 

  26. J. Park, K. Karplus, C. Barrett, R. Hughey, D. Haussler, T. Hubbard, and C. Chothia. Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J. Mol. Biol., 284:1201–1210, 1998.

    Article  Google Scholar 

  27. J. Park, S. A. Teichmann, T. Hubbard, and C. Chotia. Intermediate sequences increase the detection of homology between sequences. J. Mol. Biol., 273:349–354, 1997.

    Article  Google Scholar 

  28. W. R. Pearson and D. J. Lipman. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA, pages 2444–2448, 1988.

    Google Scholar 

  29. F. Provost, T. Fawcett, and R. Kohavi. The case against accuracy estimation for comparing induction algorithms. In Proc. 15th International Conf. on Machine Learning, pages 445–453. Morgan Kaufmann, San Francisco, CA, 1998.

    Google Scholar 

  30. F. J. Provost and T. Fawcett. Robust classification systems for imprecise environments. In AAAI/IAAI, pages 706–713, 1998.

    Google Scholar 

  31. Vijay Raghavan, Peter Bollmann, and Gwang S. Jung. A critical investigation of recall and presicion as measuers of retrievel system performance. ACM Transactions of Information Systems, 7(3):205–229, 1989.

    Article  Google Scholar 

  32. T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. J. Mol. Biol., 147:195–197, 1981. Smith, Waterman, dynamic, programming, local, alignment.

    Article  Google Scholar 

  33. J. A. Swets and R. M. Pickett. Evaluation of Diagnostic Systems: Methods from Signal Detection Theory. Academic Press, New York, 1982.

    Google Scholar 

  34. G. Tecuci. Building Intelligent Agents: An Apprenticeship Multistrategy Learning Theory, Methodology, Tool and Case Studies. Academic Press, 1998.

    Google Scholar 

  35. M. Turcotte, Steven. H. Muggleton, and Micheal J. E. Sternberg. Application of inductive logic programming to discover rules governing the three-dimensional topology of protein structure. In C. D. Page, editor, Proc. 8th International Conference on Inductive Logic Programming (ILP-98), pages 53–64. Spinger Verlag, Berlin, 1998.

    Chapter  Google Scholar 

  36. H. L. Van Trees. Detection, estimation, and modulation theory. Wiley, New York, 1971.

    MATH  Google Scholar 

  37. W Wright, P. Scordis, and T. K. Attwood. BLAST PRINTS-alternative perspectives on sequence similarity. Bioinformatics, 15(6):523–524, 1999.

    Article  Google Scholar 

  38. P. Young. PrePRINTS. http://www.bioinf.man.ac.uk/ConceptualBlast.html.

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Karwath, A., King, R.D. (2001). An Automated ILP Server in the Field of Bioinformatics. In: Rouveirol, C., Sebag, M. (eds) Inductive Logic Programming. ILP 2001. Lecture Notes in Computer Science(), vol 2157. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44797-0_8

Download citation

  • DOI: https://doi.org/10.1007/3-540-44797-0_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42538-0

  • Online ISBN: 978-3-540-44797-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics