L-Modified ILP Evaluation Functions for Positive-Only Biological Grammar Learning

Mamer, Thierry; Bryant, Christopher H.; McCall, John

doi:10.1007/978-3-540-85928-4_16

Thierry Mamer¹,
Christopher H. Bryant² &
John McCall¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5194))

Included in the following conference series:

International Conference on Inductive Logic Programming

1345 Accesses
2 Citations

Abstract

We identify a shortcoming of a standard positive-only clause evaluation function within the context of learning biological grammars. To overcome this shortcoming we propose L-modification, a modification to this evaluation function such that the lengths of individual examples are considered. We use a set of bio-sequences known as neuropeptide precursor middles (NPP-middles). Using L-modification to learn from these NPP-middles results in induced grammars that have a better performance than that achieved when using the standard positive-only clause evaluation function. We also show that L-modification improves the performance of induced grammars when learning on short, medium or long NPPs-middles. A potential disadvantage of L-modification is discussed. Finally, we show that, as the limit on the search space size increases, the greater is the increase in predictive performance arising from L-modification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bryant, C.H., Fredouille, D.: A parser for the efficient induction of biological grammars. In: Kramer, S., Pfahringer, B. (eds.) 15th International Conference on Inductive Logic Programming: late-breaking paper track, pp. 3–8. University of Bonn, Bonn (July 2005), http://wwwbib.informatik.tu-muenchen.de/infberichte/2005/TUM-I0510.idx
Google Scholar
Bryant, C.H., Fredouille, D., Wilson, A., Jayawickreme, C.K., Jupe, S., Topp, S.: Pertinent background knowledge for learning protein grammars. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 54–65. Springer, Heidelberg (2006)
Chapter Google Scholar
Fredouille, D., Bryant, C.H., Jayawickreme, C.K., Jupe, S., Topp, S.: An ILP refinement operator for biological grammar learning. In: Muggleton, S., Otero, R., Tamaddoni-Nezhad, A. (eds.) ILP 2006. LNCS (LNAI), vol. 4455, pp. 214–228. Springer, Heidelberg (2007)
Chapter Google Scholar
Muggleton, S., King, R.D., Sternberg, M.J.E.: Protein secondary structure prediction using logic-based machine learning. Protein Engineering Oxford 5(7), 647 (1992)
Article Google Scholar
Muggleton, S., Srinivasan, A., Bain, M.: Compression, significance and accuracy. In: Sleeman, D., Edwards, P. (eds.) Proceedings of the Ninth International Machine Learning Conference, pp. 338–347. Morgan Kaufmann, San Francisco (1992)
Google Scholar
Muggleton, S.H.: Inverse entailment and Progol. New Generation Computing 13, 245–286 (1995)
Article Google Scholar
Muggleton, S.H.: Learning from positive data. In: Muggleton, S.H. (ed.) ILP 1996. LNCS, vol. 1314, pp. 358–376. Springer, Heidelberg (1997)
Google Scholar
Muggleton, S.H., Bryant, C.H., Srinivasan, A., Whittaker, A., Topp, S., Rawlings, C.: Are grammatical representations useful for learning from biological sequence data? - a case study. Journal of Computational Biology 8(5), 493–522 (2001)
Article Google Scholar
Pereira, F., Warren, D.: Definite clause grammars for language analysis. Readings in natural language processing, pp. 101–124 (1986)
Google Scholar
Rissanen, J.J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)
Article MATH Google Scholar
Searls, D.B.: Linguistic approaches to biological sequences. Computer Applications in the Biosciences 13(4), 333–344 (1997)
Google Scholar
Srinivasan, A.: A learning engine for proposing hypotheses (Aleph) (1993), http://web.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph
Srinivasan, A., Muggleton, S., Bain, M.: The justification of logical theories based on data compression. Machine Intelligence 13, 91–125 (1994)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, The Robert Gordon University, St. Andrews Street, AB25 1HG, Aberdeen, Scotland, UK
Thierry Mamer & John McCall
School of Computing, Science and Engineering, University of Salford, Newton Building, Salford, Greater Manchester, M5 4WT, UK
Christopher H. Bryant

Authors

Thierry Mamer
View author publications
You can also search for this author in PubMed Google Scholar
Christopher H. Bryant
View author publications
You can also search for this author in PubMed Google Scholar
John McCall
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Filip Železný Nada Lavrač

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mamer, T., Bryant, C.H., McCall, J. (2008). L-Modified ILP Evaluation Functions for Positive-Only Biological Grammar Learning. In: Železný, F., Lavrač, N. (eds) Inductive Logic Programming. ILP 2008. Lecture Notes in Computer Science(), vol 5194. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85928-4_16

Download citation

DOI: https://doi.org/10.1007/978-3-540-85928-4_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85927-7
Online ISBN: 978-3-540-85928-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics