Abstract
We are interested in using Inductive Logic Programming (ILP) to infer grammars representing sets of biological sequences. We call these biological grammars. ILP systems are well suited to this task in the sense that biological grammars have been represented as logic programs using the Definite Clause Grammar or the String Variable Grammar formalisms. However, the speed at which ILP systems can generate biological grammars has been shown to be a bottleneck. This paper presents a novel refinement operator implementation, specialised to infer biological grammars with ILP techniques. This implementation is shown to significantly speed-up inference times compared to the use of the classical refinement operator: time gains larger than 5-fold were observed in \(\frac{4}{5}\) of the experiments, and the maximum observed gain is over 300-fold.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L.L., Studholme, D.J., Yeats, C., Eddy, S.R.: The Pfam protein families database. Nucleic Acids Research 32, 138–141 (2004)
Bryant, C.H., Fredouille, D.: A parser for the efficient induction of biological grammars. In: Kramer, S., Pfahringer, B. (eds.) 15th International Conference on ILP: late-breaking paper track. University of Bonn (2005), http://wwwbib.informatik.tu-muenchen.de/infberichte/2005/TUM-I0510.idx
Bryant, C.H., Fredouille, D., Wilson, A., Jayawickreme, C.K., Jupe, S., Topp, S.: Pertinent background knowledge for learning protein grammars. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 54–65. Springer, Heidelberg (2006)
Cussens, J., Pulman, S.: Experiments in inductive chart parsing. In: Cussens, J. (ed.) LLL 1999, pp. 72–83, Bled, Slovenia (June 1999)
Dsouza, M., Larsen, N., Overbeek, R.: Searching for patterns in genomic data. Trends in Genetics 13(12), 497–498 (1997)
Falquet, L., Pagni, M., Bucher, P., Hulo, N., Sigrist, C.J., Hofmann, K., Bairoch, A.: Protein data bank. Nucleic Acid Research 30, 235–238 (2002)
Lavrač, N., Džeroski, S.: Inductive Logic Programming: Techniques and Applications. Ellis Hortwood, New York (1994)
Leung, S.-W., Mellish, C., Robertson, D.: Basic Gene Grammars and DNA-ChartParser for language processing of Escherichia coli promoter DNA sequences. Bioinformatics 17(3), 226–236 (2001)
Muggleton, S.H., Bryant, C.H., Srinivasan, A., Whittaker, A., Topp, S., Rawlings, C.: Are grammatical representations useful for learning from biological sequence data? – a case study. Journal of Computational Biology 5(8), 493–522 (2001)
Muggleton, S.H.: Inverse entailment and Progol. New Generation Computing, Special issue on Inductive Logic Programming 13(3-4), 245–286 (1995)
Muggleton, S.H.: Learning from positive data. In: Inductive Logic Programming. LNCS, vol. 1314, pp. 358–376. Springer, Heidelberg (1997)
Pulman, S., Cussens, J.: Grammar learning using inductive logic programming. Oxford University Working Papers in Linguistics, Philology and Phonetics 6, 31–45 (2001)
Pierce, K.L., Premont, R.T., Lefkowitz, R.J.: Seven-transmembrane receptors. Nat. Rev. Mol. Cell. Biol. 3(9,6), 39–50 (2002)
Pereira, F., Warren, D.H.D.: Definite clause grammars for language analysis – a survey of the formalism and a comparison with augmented transition networks. Artificial Intelligence 13(3), 231–278 (1980)
Sakakibara, Y., Brown, M., Hughey, R., Saira Mian, I.: Stochastic context-free grammars for tRNA modeling. Nucleic Acids Research 22, 5112–5120 (1994)
Searls, D.B.: String variable grammar: A logic grammar formalism for the biological language of DNA. Journal of logic Programming 12 (1993)
Srinivasan, A.: A Learning Engine for Proposing Hypotheses (Aleph) (1993), http://web.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph
Tausend, B.: Representing biases for inductive logic programming. In: Bergadano, F., De Raedt, L. (eds.) Machine Learning: ECML-94. LNCS, vol. 784, pp. 427–430. Springer, Heidelberg (1994)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fredouille, D.C., Bryant, C.H., Jayawickreme, C.K., Jupe, S., Topp, S. (2007). An ILP Refinement Operator for Biological Grammar Learning. In: Muggleton, S., Otero, R., Tamaddoni-Nezhad, A. (eds) Inductive Logic Programming. ILP 2006. Lecture Notes in Computer Science(), vol 4455. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73847-3_24
Download citation
DOI: https://doi.org/10.1007/978-3-540-73847-3_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73846-6
Online ISBN: 978-3-540-73847-3
eBook Packages: Computer ScienceComputer Science (R0)