Abstract
We identify a shortcoming of a standard positive-only clause evaluation function within the context of learning biological grammars. To overcome this shortcoming we propose L-modification, a modification to this evaluation function such that the lengths of individual examples are considered. We use a set of bio-sequences known as neuropeptide precursor middles (NPP-middles). Using L-modification to learn from these NPP-middles results in induced grammars that have a better performance than that achieved when using the standard positive-only clause evaluation function. We also show that L-modification improves the performance of induced grammars when learning on short, medium or long NPPs-middles. A potential disadvantage of L-modification is discussed. Finally, we show that, as the limit on the search space size increases, the greater is the increase in predictive performance arising from L-modification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bryant, C.H., Fredouille, D.: A parser for the efficient induction of biological grammars. In: Kramer, S., Pfahringer, B. (eds.) 15th International Conference on Inductive Logic Programming: late-breaking paper track, pp. 3–8. University of Bonn, Bonn (July 2005), http://wwwbib.informatik.tu-muenchen.de/infberichte/2005/TUM-I0510.idx
Bryant, C.H., Fredouille, D., Wilson, A., Jayawickreme, C.K., Jupe, S., Topp, S.: Pertinent background knowledge for learning protein grammars. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 54–65. Springer, Heidelberg (2006)
Fredouille, D., Bryant, C.H., Jayawickreme, C.K., Jupe, S., Topp, S.: An ILP refinement operator for biological grammar learning. In: Muggleton, S., Otero, R., Tamaddoni-Nezhad, A. (eds.) ILP 2006. LNCS (LNAI), vol. 4455, pp. 214–228. Springer, Heidelberg (2007)
Muggleton, S., King, R.D., Sternberg, M.J.E.: Protein secondary structure prediction using logic-based machine learning. Protein Engineering Oxford 5(7), 647 (1992)
Muggleton, S., Srinivasan, A., Bain, M.: Compression, significance and accuracy. In: Sleeman, D., Edwards, P. (eds.) Proceedings of the Ninth International Machine Learning Conference, pp. 338–347. Morgan Kaufmann, San Francisco (1992)
Muggleton, S.H.: Inverse entailment and Progol. New Generation Computing 13, 245–286 (1995)
Muggleton, S.H.: Learning from positive data. In: Muggleton, S.H. (ed.) ILP 1996. LNCS, vol. 1314, pp. 358–376. Springer, Heidelberg (1997)
Muggleton, S.H., Bryant, C.H., Srinivasan, A., Whittaker, A., Topp, S., Rawlings, C.: Are grammatical representations useful for learning from biological sequence data? - a case study. Journal of Computational Biology 8(5), 493–522 (2001)
Pereira, F., Warren, D.: Definite clause grammars for language analysis. Readings in natural language processing, pp. 101–124 (1986)
Rissanen, J.J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)
Searls, D.B.: Linguistic approaches to biological sequences. Computer Applications in the Biosciences 13(4), 333–344 (1997)
Srinivasan, A.: A learning engine for proposing hypotheses (Aleph) (1993), http://web.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph
Srinivasan, A., Muggleton, S., Bain, M.: The justification of logical theories based on data compression. Machine Intelligence 13, 91–125 (1994)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mamer, T., Bryant, C.H., McCall, J. (2008). L-Modified ILP Evaluation Functions for Positive-Only Biological Grammar Learning. In: Železný, F., Lavrač, N. (eds) Inductive Logic Programming. ILP 2008. Lecture Notes in Computer Science(), vol 5194. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85928-4_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-85928-4_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85927-7
Online ISBN: 978-3-540-85928-4
eBook Packages: Computer ScienceComputer Science (R0)