Skip to main content

Mining Chemical Compound Structure Data Using Inductive Logic Programming

  • Conference paper
Active Mining

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3430))

Abstract

Discovering knowledge from chemical compound structure data is a challenge task in KDD. It aims to generate hypotheses describing activities or characteristics of chemical compounds from their own structures. Since each compound composes of several parts with complicated relations among them, traditional mining algorithms cannot handle this kind of data efficiently. In this research, we apply Inductive Logic Programming (ILP) for classifying chemical compounds. ILP provides comprehensibility to learning results and capability to handle more complex data consisting of their relations. Nevertheless, the bottleneck for learning first-order theory is enormous hypothesis search space which causes inefficient performance by the existing learning approaches compared to the propositional approaches. We introduces an improved ILP approach capable of handling more efficiently a kind of data called multiple-part data, i.e., one instance of data consists of several parts as well as relations among parts. The approach tries to find hypothesis describing class of each training example by using both individual and relational characteristics of its part which is similar to finding common substructures among the complex relational instances. Chemical compound data is multiple-part data. Each compound is composed of atoms as parts, and various kinds of bond as relations among atoms. We then apply the proposed algorithm for chemical compound structure by conducting experiments on two real-world datasets: mutagenicity in nitroaromatic compounds and dopamine antagonist compounds. The experiment results were compared to the previous approaches in order to show the performance of proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Quinlan, J.R.: Learning logical definitions from relations. Machine Learning 5, 239–266 (1990)

    Google Scholar 

  2. Dietterich, T.G., Lathrop, R.H., Lozano-Perez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence 89, 31–71 (1997)

    Article  MATH  Google Scholar 

  3. Wang, J., Zucker, J.D.: Solving the multiple-instance problem: A lazy learning approach. In: Proc. 17th International Conf. on Machine Learning, Morgan Kaufmann, San Francisco, CA, pp. 1119–1125. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  4. Chevaleyre, Y., Zucker, J.D.: A framework for learning rules from multiple instance data. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 49–60. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  5. Gärtner, T., Flach, P.A., Kowalczyk, A., Smola, A.J.: Multi-instance kernels. In: Proc. 19th International Conf. on Machine Learning, pp. 179–186. Morgan Kaufmann, San Francisco (2002)

    Google Scholar 

  6. Maron, O., Lozano-Pérez, T.: A framework for multiple-instance learning. In: Jordan, M.I., Kearns, M.J., Solla, S.A. (eds.) Advances in Neural Information Processing Systems, vol. 10, The MIT Press, Cambridge (1998)

    Google Scholar 

  7. Srinivasan, A., Muggleton, S., King, R., Sternberg, M.: Mutagenesis: ILP experiments in a non-determinate biological domain. In: Wrobel, S. (ed.) Proc. 4th International Workshop on Inductive Logic Programming, Gesellschaft für Mathematik und Datenverarbeitung MBH, vol. 237, pp. 217–232 (1994)

    Google Scholar 

  8. Srinivasan, A.: The Aleph manual (2001), http://web.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/

  9. Selman, B., Levesque, H.J., Mitchell, D.: A new method for solving hard satisfiability problems. In: Proceedings 10th National Conference on Artificial Intelligence, pp. 440–446 (1992)

    Google Scholar 

  10. King, R.D., Sternberg, M.J.E., Srinivasan, A.: Relating chemical activity to structure: An examination of ILP successes. New Generation Computing 13, 411–433 (1995)

    Article  Google Scholar 

  11. Srinivasan, A., Muggleton, S., Sternberg, M.J.E., King, R.D.: Theories for mutagenicity: A study in first-order and feature-based induction. Artificial Intelligence 85, 277–299 (1996)

    Article  Google Scholar 

  12. Weidmann, N., Frank, E., Pfahringer, B.: A two-level learning method for generalized multi-instance problems. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) ECML 2003. LNCS (LNAI), vol. 2837, pp. 468–479. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  13. McGovern, A., Jensen, D.: Identifying predictive structures in relational data using multiple instance learning. In: Proceedings of the 20th International Conference on Machine Learning, ICML 2003 (2003)

    Google Scholar 

  14. Chevaleyre, Y., Zucker, J.D.: Solving multiple-instance and multiple-part learning problems with decision trees and decision rules: Application to the mutagenesis problem. Technical report, LIP6-CNRS, University Paris VI (2000)

    Google Scholar 

  15. Zucker, J.-D.: Solving multiple-instance and multiple-part learning problems with decision trees and rule sets. application to the mutagenesis problem. In: Proceedings of Canadian Conference on AI 2001, pp. 204–214 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nattee, C., Sinthupinyo, S., Numao, M., Okada, T. (2005). Mining Chemical Compound Structure Data Using Inductive Logic Programming. In: Tsumoto, S., Yamaguchi, T., Numao, M., Motoda, H. (eds) Active Mining. Lecture Notes in Computer Science(), vol 3430. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11423270_6

Download citation

  • DOI: https://doi.org/10.1007/11423270_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26157-5

  • Online ISBN: 978-3-540-31933-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics