Mining Chemical Compound Structure Data Using Inductive Logic Programming

Nattee, Cholwich; Sinthupinyo, Sukree; Numao, Masayuki; Okada, Takashi

doi:10.1007/11423270_6

Cholwich Nattee²²,
Sukree Sinthupinyo²²,
Masayuki Numao²² &
…
Takashi Okada²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3430))

750 Accesses
2 Citations

Abstract

Discovering knowledge from chemical compound structure data is a challenge task in KDD. It aims to generate hypotheses describing activities or characteristics of chemical compounds from their own structures. Since each compound composes of several parts with complicated relations among them, traditional mining algorithms cannot handle this kind of data efficiently. In this research, we apply Inductive Logic Programming (ILP) for classifying chemical compounds. ILP provides comprehensibility to learning results and capability to handle more complex data consisting of their relations. Nevertheless, the bottleneck for learning first-order theory is enormous hypothesis search space which causes inefficient performance by the existing learning approaches compared to the propositional approaches. We introduces an improved ILP approach capable of handling more efficiently a kind of data called multiple-part data, i.e., one instance of data consists of several parts as well as relations among parts. The approach tries to find hypothesis describing class of each training example by using both individual and relational characteristics of its part which is similar to finding common substructures among the complex relational instances. Chemical compound data is multiple-part data. Each compound is composed of atoms as parts, and various kinds of bond as relations among atoms. We then apply the proposed algorithm for chemical compound structure by conducting experiments on two real-world datasets: mutagenicity in nitroaromatic compounds and dopamine antagonist compounds. The experiment results were compared to the previous approaches in order to show the performance of proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Quinlan, J.R.: Learning logical definitions from relations. Machine Learning 5, 239–266 (1990)
Google Scholar
Dietterich, T.G., Lathrop, R.H., Lozano-Perez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence 89, 31–71 (1997)
Article MATH Google Scholar
Wang, J., Zucker, J.D.: Solving the multiple-instance problem: A lazy learning approach. In: Proc. 17th International Conf. on Machine Learning, Morgan Kaufmann, San Francisco, CA, pp. 1119–1125. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Chevaleyre, Y., Zucker, J.D.: A framework for learning rules from multiple instance data. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 49–60. Springer, Heidelberg (2001)
Chapter Google Scholar
Gärtner, T., Flach, P.A., Kowalczyk, A., Smola, A.J.: Multi-instance kernels. In: Proc. 19th International Conf. on Machine Learning, pp. 179–186. Morgan Kaufmann, San Francisco (2002)
Google Scholar
Maron, O., Lozano-Pérez, T.: A framework for multiple-instance learning. In: Jordan, M.I., Kearns, M.J., Solla, S.A. (eds.) Advances in Neural Information Processing Systems, vol. 10, The MIT Press, Cambridge (1998)
Google Scholar
Srinivasan, A., Muggleton, S., King, R., Sternberg, M.: Mutagenesis: ILP experiments in a non-determinate biological domain. In: Wrobel, S. (ed.) Proc. 4th International Workshop on Inductive Logic Programming, Gesellschaft für Mathematik und Datenverarbeitung MBH, vol. 237, pp. 217–232 (1994)
Google Scholar
Srinivasan, A.: The Aleph manual (2001), http://web.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/
Selman, B., Levesque, H.J., Mitchell, D.: A new method for solving hard satisfiability problems. In: Proceedings 10th National Conference on Artificial Intelligence, pp. 440–446 (1992)
Google Scholar
King, R.D., Sternberg, M.J.E., Srinivasan, A.: Relating chemical activity to structure: An examination of ILP successes. New Generation Computing 13, 411–433 (1995)
Article Google Scholar
Srinivasan, A., Muggleton, S., Sternberg, M.J.E., King, R.D.: Theories for mutagenicity: A study in first-order and feature-based induction. Artificial Intelligence 85, 277–299 (1996)
Article Google Scholar
Weidmann, N., Frank, E., Pfahringer, B.: A two-level learning method for generalized multi-instance problems. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) ECML 2003. LNCS (LNAI), vol. 2837, pp. 468–479. Springer, Heidelberg (2003)
Chapter Google Scholar
McGovern, A., Jensen, D.: Identifying predictive structures in relational data using multiple instance learning. In: Proceedings of the 20th International Conference on Machine Learning, ICML 2003 (2003)
Google Scholar
Chevaleyre, Y., Zucker, J.D.: Solving multiple-instance and multiple-part learning problems with decision trees and decision rules: Application to the mutagenesis problem. Technical report, LIP6-CNRS, University Paris VI (2000)
Google Scholar
Zucker, J.-D.: Solving multiple-instance and multiple-part learning problems with decision trees and rule sets. application to the mutagenesis problem. In: Proceedings of Canadian Conference on AI 2001, pp. 204–214 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

The Institute of Scientific and Industrial Research, Osaka University, 8-1 Mihogaoka, Ibaraki, Osaka, 567-0047, Japan
Cholwich Nattee, Sukree Sinthupinyo & Masayuki Numao
Department of Informatics, School of Science and Technology, Kwansei Gakuin University, 2-1 Gakuen-cho, Sanda, Hyogo, 669-1323, Japan
Takashi Okada

Authors

Cholwich Nattee
View author publications
You can also search for this author in PubMed Google Scholar
Sukree Sinthupinyo
View author publications
You can also search for this author in PubMed Google Scholar
Masayuki Numao
View author publications
You can also search for this author in PubMed Google Scholar
Takashi Okada
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Shimane University, 89-1 Enya-cho Izumo, 6938501, Shimane, Japan
Shusaku Tsumoto
Faculty of Science and Technology, Keio University, 3-14-1 Hiyoshi Kohoku-ku, 223-8522, Yokohama, Japan
Takahira Yamaguchi
The Institute of Scientific and Industrial Research, Osaka University, Japan
Masayuki Numao
Institute of Scientific and Industrial Research, Osaka University, 8-1 Mihogaoka, Ibaraki, 567-0047, Osaka, Japan
Hiroshi Motoda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nattee, C., Sinthupinyo, S., Numao, M., Okada, T. (2005). Mining Chemical Compound Structure Data Using Inductive Logic Programming. In: Tsumoto, S., Yamaguchi, T., Numao, M., Motoda, H. (eds) Active Mining. Lecture Notes in Computer Science(), vol 3430. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11423270_6

Download citation

DOI: https://doi.org/10.1007/11423270_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26157-5
Online ISBN: 978-3-540-31933-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics