An Automated ILP Server in the Field of Bioinformatics

Karwath, Andreas; King, Ross D.

doi:10.1007/3-540-44797-0_8

Andreas Karwath³ &
Ross D. King³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2157))

Included in the following conference series:

International Conference on Inductive Logic Programming

322 Accesses

Abstract

The identification of evolutionary related (homologous) proteins is a key problem in molecular biology. Here we present a inductive logic programming based method, Homology Induction (HI), which acts as a filter for existing sequence similarity searches to improve their performance in the detection of remote protein homologies. HI performs a PSI-BLAST search to generate positive, negative, and uncertain examples, and collects descriptions of these examples. It then learns rules to discriminate the positive and negative examples. The rules are used to filter the uncertain examples in the “twilight zone”. HI uses a multitable database of 51,430,710 pre-fabricated facts from a variety of biological sources, and the inductive logic programming system Aleph to induce rules. Hi was tested on an independent set of protein sequences with equal or less than 40 per cent sequence similarity (PDB40D). ROC analysis is performed showing that HI can significantly improve existing similarity searches. The method is automated and can be used via a web/mail interface.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

S. F. Altschul, W. Gish, W. Miller, Eugene W. Myers, and D. J. Lipman. Basic local alignment search tool. J. Mol. Biol., 215:403–410, 1990.
Google Scholar
S. F. Altschul, T L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research, 25(17):3389–3402, 1997.
Article Google Scholar
A. Bairoch and R. Apweiler. The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 2000. Nucleic Acids Research, 28:45–48, 2000.
Article Google Scholar
A. P. Bradley. The use of area under ROC curve in the evaluation of learning algorithms. Pattern Recognition, 30(7):1145–1159, 1995.
Article Google Scholar
L. Breiman. Bagging predictors. Machine Learning, 26(2):123–140, 1996.
Google Scholar
L. Dehaspe. Frequent Pattern Discovery in First-Order Logic. PhD thesis, Department of Computer Science, Katholieke Universiteit Leuven, Belgium, 1998.
Google Scholar
S. Dzeroski. Inductive logic programming and knowledge discovery. In U. M. Fayyad, G. Piatetsky-Sharpiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 117–152. AAAI/MIT Press, 1996.
Google Scholar
J. P. Egan. Signal Detection Theory and ROC Analysis. Cognition and Perception. Academic Press, New York, 1975.
Google Scholar
D. Eisenberg. Three-dimensional structure of membrane and surface proteins. Ann. Rev. Biochem, 53:595–623, 1984.
Article Google Scholar
Y. Freud and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997.
Article MathSciNet Google Scholar
U. Hobohm and C. Sander. A sequence property approach to searching protein database. J. Mol. Biol., 251:390–399, 1995.
Article Google Scholar
T. Jaakola, M. Diekhans, and D. Haussler. Using Fisher kernel method to detect remote protein homologies. In Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology, pages 149–158. AAAI, AAAI Press, 1999.
Google Scholar
K. Karplus, C. Barrett, and R. Hughey. Hidden markov models for detecting remote protein homologies. Bioinformatics, 14(10):846–856, 1998. SAM-T98 paper.
Article Google Scholar
R. D. King, S. Muggleton, A. Srinivasan, and M. J. E. Sterberg. Structure-activity relationships derived by machine learning: The use of atoms and their bond connectivities to predict mutagenicity by inductive logic programming. Proc. Natl. Acad. Sci. USA, 93:438–442, 1996.
Article Google Scholar
Ross D. King and Ashwin Srinivasan. The discovery of indicator variables for qsar unsing inductive logic programming. Journal of Compter-Aided Molecular Design, 11:571–580, 1997.
Article Google Scholar
E. R. Kirk. Statistics: An Introduction. Hardcourt Brace College, USA, fourth edition, 1999.
Google Scholar
N. Lavrac and S. Dzeroski. Inductive Logic Programming: Techniques and Applications. Ellis Horwood, 1994.
Google Scholar
D. J. Lipman. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 25:3389–3402, 1997.
Article Google Scholar
D. J. Lipman and W. R. Pearson. Rapid and sensitive protein similarity searches. Science, 277:1435–1441, March 1985.
Google Scholar
R. M. MacCallum, L. A. Kelley, and M. J. E. Sternberg. SAWTED: Structure Assignment With TExt Description-enhanced detection of remote homologues with automated SWISS-PROT annotation comparision. Bioinformatics, 16(2):125–129, 2000.
Article Google Scholar
Stephen Muggleton. Inductive logic programming. New Generation Computing, 8(4):295–318, 1990.
Article Google Scholar
Stephen Muggleton. Inverse entailment and progol. New Generation Computing Journal, 13:245–286, 1995.
Google Scholar
A. G. Murzin, S. E. Brenner, T. Hubbard, and C. Chothia. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 247:536–540, 1995.
Article Google Scholar
S. B. Needleman and C. D. Wunsch. A general method applicable to the research for similarities in the amino acid sequencesof two proteins. J. Mol. Biol., 48:443–453, 1970.
Article Google Scholar
H. Nielsen, J. Engelbrecht, S. Brunack, and G. von Heijne. Identification of prokaryotic and eukariotic signal peptides and prediction of their cleavage sites. Protein Engineering, 10:1–6, 1997.
Article Google Scholar
J. Park, K. Karplus, C. Barrett, R. Hughey, D. Haussler, T. Hubbard, and C. Chothia. Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J. Mol. Biol., 284:1201–1210, 1998.
Article Google Scholar
J. Park, S. A. Teichmann, T. Hubbard, and C. Chotia. Intermediate sequences increase the detection of homology between sequences. J. Mol. Biol., 273:349–354, 1997.
Article Google Scholar
W. R. Pearson and D. J. Lipman. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA, pages 2444–2448, 1988.
Google Scholar
F. Provost, T. Fawcett, and R. Kohavi. The case against accuracy estimation for comparing induction algorithms. In Proc. 15th International Conf. on Machine Learning, pages 445–453. Morgan Kaufmann, San Francisco, CA, 1998.
Google Scholar
F. J. Provost and T. Fawcett. Robust classification systems for imprecise environments. In AAAI/IAAI, pages 706–713, 1998.
Google Scholar
Vijay Raghavan, Peter Bollmann, and Gwang S. Jung. A critical investigation of recall and presicion as measuers of retrievel system performance. ACM Transactions of Information Systems, 7(3):205–229, 1989.
Article Google Scholar
T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. J. Mol. Biol., 147:195–197, 1981. Smith, Waterman, dynamic, programming, local, alignment.
Article Google Scholar
J. A. Swets and R. M. Pickett. Evaluation of Diagnostic Systems: Methods from Signal Detection Theory. Academic Press, New York, 1982.
Google Scholar
G. Tecuci. Building Intelligent Agents: An Apprenticeship Multistrategy Learning Theory, Methodology, Tool and Case Studies. Academic Press, 1998.
Google Scholar
M. Turcotte, Steven. H. Muggleton, and Micheal J. E. Sternberg. Application of inductive logic programming to discover rules governing the three-dimensional topology of protein structure. In C. D. Page, editor, Proc. 8th International Conference on Inductive Logic Programming (ILP-98), pages 53–64. Spinger Verlag, Berlin, 1998.
Chapter Google Scholar
H. L. Van Trees. Detection, estimation, and modulation theory. Wiley, New York, 1971.
MATH Google Scholar
W Wright, P. Scordis, and T. K. Attwood. BLAST PRINTS-alternative perspectives on sequence similarity. Bioinformatics, 15(6):523–524, 1999.
Article Google Scholar
P. Young. PrePRINTS. http://www.bioinf.man.ac.uk/ConceptualBlast.html.

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Wales, Aberystwyth, SY23 3DB, Ceredigion, UK
Andreas Karwath & Ross D. King

Authors

Andreas Karwath
View author publications
You can also search for this author in PubMed Google Scholar
Ross D. King
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Université Paris Sud, LRI, bât. 490, 91405, Orsay, France
Céline Rouveirol
Ecole Polytechnique, LMS, 91128, Palaiseau, France
Michéle Sebag

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Karwath, A., King, R.D. (2001). An Automated ILP Server in the Field of Bioinformatics. In: Rouveirol, C., Sebag, M. (eds) Inductive Logic Programming. ILP 2001. Lecture Notes in Computer Science(), vol 2157. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44797-0_8

Download citation

DOI: https://doi.org/10.1007/3-540-44797-0_8
Published: 30 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42538-0
Online ISBN: 978-3-540-44797-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics