An ILP Refinement Operator for Biological Grammar Learning

Fredouille, Daniel C.; Bryant, Christopher H.; Jayawickreme, Channa K.; Jupe, Steven; Topp, Simon

doi:10.1007/978-3-540-73847-3_24

An ILP Refinement Operator for Biological Grammar Learning

Daniel C. Fredouille¹,
Christopher H. Bryant¹,
Channa K. Jayawickreme²,
Steven Jupe³ &
…
Simon Topp⁴

Conference paper

478 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4455))

Abstract

We are interested in using Inductive Logic Programming (ILP) to infer grammars representing sets of biological sequences. We call these biological grammars. ILP systems are well suited to this task in the sense that biological grammars have been represented as logic programs using the Definite Clause Grammar or the String Variable Grammar formalisms. However, the speed at which ILP systems can generate biological grammars has been shown to be a bottleneck. This paper presents a novel refinement operator implementation, specialised to infer biological grammars with ILP techniques. This implementation is shown to significantly speed-up inference times compared to the use of the classical refinement operator: time gains larger than 5-fold were observed in \(\frac{4}{5}\) of the experiments, and the maximum observed gain is over 300-fold.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L.L., Studholme, D.J., Yeats, C., Eddy, S.R.: The Pfam protein families database. Nucleic Acids Research 32, 138–141 (2004)
Article Google Scholar
Bryant, C.H., Fredouille, D.: A parser for the efficient induction of biological grammars. In: Kramer, S., Pfahringer, B. (eds.) 15^th International Conference on ILP: late-breaking paper track. University of Bonn (2005), http://wwwbib.informatik.tu-muenchen.de/infberichte/2005/TUM-I0510.idx
Bryant, C.H., Fredouille, D., Wilson, A., Jayawickreme, C.K., Jupe, S., Topp, S.: Pertinent background knowledge for learning protein grammars. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 54–65. Springer, Heidelberg (2006)
Chapter Google Scholar
Cussens, J., Pulman, S.: Experiments in inductive chart parsing. In: Cussens, J. (ed.) LLL 1999, pp. 72–83, Bled, Slovenia (June 1999)
Google Scholar
Dsouza, M., Larsen, N., Overbeek, R.: Searching for patterns in genomic data. Trends in Genetics 13(12), 497–498 (1997)
Article Google Scholar
Falquet, L., Pagni, M., Bucher, P., Hulo, N., Sigrist, C.J., Hofmann, K., Bairoch, A.: Protein data bank. Nucleic Acid Research 30, 235–238 (2002)
Article Google Scholar
Lavrač, N., Džeroski, S.: Inductive Logic Programming: Techniques and Applications. Ellis Hortwood, New York (1994)
MATH Google Scholar
Leung, S.-W., Mellish, C., Robertson, D.: Basic Gene Grammars and DNA-ChartParser for language processing of Escherichia coli promoter DNA sequences. Bioinformatics 17(3), 226–236 (2001)
Article Google Scholar
Muggleton, S.H., Bryant, C.H., Srinivasan, A., Whittaker, A., Topp, S., Rawlings, C.: Are grammatical representations useful for learning from biological sequence data? – a case study. Journal of Computational Biology 5(8), 493–522 (2001)
Article Google Scholar
Muggleton, S.H.: Inverse entailment and Progol. New Generation Computing, Special issue on Inductive Logic Programming 13(3-4), 245–286 (1995)
Google Scholar
Muggleton, S.H.: Learning from positive data. In: Inductive Logic Programming. LNCS, vol. 1314, pp. 358–376. Springer, Heidelberg (1997)
Google Scholar
Pulman, S., Cussens, J.: Grammar learning using inductive logic programming. Oxford University Working Papers in Linguistics, Philology and Phonetics 6, 31–45 (2001)
Google Scholar
Pierce, K.L., Premont, R.T., Lefkowitz, R.J.: Seven-transmembrane receptors. Nat. Rev. Mol. Cell. Biol. 3(9,6), 39–50 (2002)
Google Scholar
Pereira, F., Warren, D.H.D.: Definite clause grammars for language analysis – a survey of the formalism and a comparison with augmented transition networks. Artificial Intelligence 13(3), 231–278 (1980)
Article MATH MathSciNet Google Scholar
Sakakibara, Y., Brown, M., Hughey, R., Saira Mian, I.: Stochastic context-free grammars for tRNA modeling. Nucleic Acids Research 22, 5112–5120 (1994)
Article Google Scholar
Searls, D.B.: String variable grammar: A logic grammar formalism for the biological language of DNA. Journal of logic Programming 12 (1993)
Google Scholar
Srinivasan, A.: A Learning Engine for Proposing Hypotheses (Aleph) (1993), http://web.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph
Tausend, B.: Representing biases for inductive logic programming. In: Bergadano, F., De Raedt, L. (eds.) Machine Learning: ECML-94. LNCS, vol. 784, pp. 427–430. Springer, Heidelberg (1994)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, The Robert Gordon University, Aberdeen, UK
Daniel C. Fredouille & Christopher H. Bryant
Discovery Research Biology, GlaxoSmithKline, Durham, USA
Channa K. Jayawickreme
Department of Bioinformatics, GlaxoSmithKline, Stevenage, UK
Steven Jupe
Department of Bioinformatics, GlaxoSmithKline, Harlow, UK
Simon Topp

Authors

Daniel C. Fredouille
View author publications
You can also search for this author in PubMed Google Scholar
Christopher H. Bryant
View author publications
You can also search for this author in PubMed Google Scholar
Channa K. Jayawickreme
View author publications
You can also search for this author in PubMed Google Scholar
Steven Jupe
View author publications
You can also search for this author in PubMed Google Scholar
Simon Topp
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Stephen Muggleton Ramon Otero Alireza Tamaddoni-Nezhad

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fredouille, D.C., Bryant, C.H., Jayawickreme, C.K., Jupe, S., Topp, S. (2007). An ILP Refinement Operator for Biological Grammar Learning. In: Muggleton, S., Otero, R., Tamaddoni-Nezhad, A. (eds) Inductive Logic Programming. ILP 2006. Lecture Notes in Computer Science(), vol 4455. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73847-3_24

Download citation

DOI: https://doi.org/10.1007/978-3-540-73847-3_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73846-6
Online ISBN: 978-3-540-73847-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics