Abstract
Similarity of graphs with labeled vertices and edges is naturally defined in terms of maximal common subgraphs. To avoid computation overload, a parameterized technique for approximation of graphs and their similarity is used. A lattice-based method of binarizing labeled graphs that respects the similarity operation on graph sets is proposed. This method allows one to compute graph similarity by means of algorithms for computing closed sets. Results of several computer experiments in predicting biological activity of chemical compounds that employ the proposed technique testify in favour of graph approximations as compared to complete graph representations: gaining in efficiency one (almost) does not lose in accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
King, R., Srinivasan, A., Dehaspe, L.: WARMR: A Data Mining tool for chemical data. J. of Computer-Aided Molecular Design 15, 173–181 (2001)
Kramer, S.: Structural Regression Trees. In: Proc. 13th National Conference on Artificial Intelligence, AAAI 1996, pp. 812–819. AAAI Press/MIT Press, Cambridge/Menlo Park (1996)
Kuznetsov, S.: Learning of Simple Conceptual Graphs from Positive and Negative Examples. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 384–392. Springer, Heidelberg (1999)
Borgelt, C., Berthold, M.: Mining Molecular Fragments: Finding Relevant Substructures of Molecules. In: Zhong, N., Yu, P. (eds.) Proc. 2nd IEEE International Conference on Data Mining, ICDM 2002, Piscataway, NJ, USA, pp. 51–58. IEEE Press, Los Alamitos (2002)
Inokuchi, A., Washio, T., Motoda, H.: Complete Mining of Frequent Patterns from Graphs: Mining Graph Data. Machine Learning 50, 321–354 (2003)
Washio, T., Motoda, H.: State of the art of graph-based data mining. SIGKDD Explorations Newsletter 5, 59–68 (2003)
Yan, X., Han, J.: gSpan: Graph-Based Substructure Pattern Mining. In: Proc. IEEE Int. Conf. on Data Mining, ICDM 2002, pp. 721–724. IEEE Computer Society, Los Alamitos (2002)
Yan, X., Han, J.: CloseGraph: mining closed frequent graph patterns. In: Getoor, L., Senator, T., Domingos, P., Faloutsos, C. (eds.) Proc. of the 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KDD 2003, pp. 286–295. ACM Press, New York (2003)
Gonzalez, J., Holder, L., Cook, D.: Experimental Comparison of Graph-Based Relational Concept Learning with Inductive Logic Programming System. In: Matwin, S., Sammut, C. (eds.) ILP 2002. LNCS (LNAI), vol. 2583, pp. 84–100. Springer, Heidelberg (2003)
Blinova, V., Dobrynin, D., Finn, V., Kuznetsov, S., Pankratova, E.: Toxicology analysis by means of the JSM-method. Bioinformatics 19, 1201–1207 (2003)
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient Mining of Association Rules Using Closed Itemset Lattices. J. Inf. Systems 24, 25–46 (1999)
Avidon, V., Pomerantsev, A.: Structure-Activity Relationship Oriented Languages for Chemical Structure Representation. J. Chem. Inf. Comput. Sci. 22, 207–214 (1982)
Helma, C., King, R., Kramer, S., Srinvasan, A. (eds.): Proc. of the Workshop on Predictive Toxicology Challegnge at the 5th Conference on Data Mining and Knowledge Discovery, PKDD 2001 (September 7, 2001), http://www.predictive-toxicology.org/ptc/
Pfahringer, B.: (The Futility of) Trying to Predict Carcinogenicity of Chemical Compounds. In: Helma, C., King, R., Kramer, S., Srinvasan, A. (eds.) Proc. of the Workshop on Predictive Toxicology Challegnge at the 5th Conference on Data Mining and Knowledge Discovery, PKDD 2001 (2001), http://www.predictivetoxicology.org/ptc/
Kuznetsov, S.: Similarity operation on hypergraphs as a basis of plausible inference. In: Proc. 1st Soviet Conference on Artificial Intelligence, pp. 442–448 (1988)
Kuznetsov, S.: JSM-method as a machine learning method. Itogi Nauki i Tekhniki, ser. Informatika 15, 17–50 (1991) (in Russian)
Kuznetsov, S., Obiedkov, S.: Comparing performance of algorithms for generating concept lattices. J. Exp. Theor. Artif. Intell. 14, 189–216 (2002)
Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer, Berlin, Heidelberg (1999)
Ganter, B., Kuznetsov, S.: Pattern Structures and Their Projections. In: Delugach, H.S., Stumme, G. (eds.) ICCS 2001. LNCS (LNAI), vol. 2120, pp. 129–142. Springer, Heidelberg (2001)
Ganter, B., Grigoriev, P., Kuznetsov, S., Samokhin, M.: Concept-Based Data Mining with Scaled Labeled Graphs. In: Wolff, K.E., Pfeiffer, H.D., Delugach, H.S. (eds.) ICCS 2004. LNCS (LNAI), vol. 3127, pp. 94–108. Springer, Heidelberg (2004)
Mitchell, T.: Machine Learning. The McGraw-Hill Companies, New York (1997)
Finn, V.: Plausible Reasoning in Systems of JSM Type. Itogi Nauki i Tekhniki, Seriya Informatika 15, 54–101 (1991) (in Russian)
Yan, L.S.: Study of carcinogenic mechanism of polycyclic aromatic hydrocarbonsextended bay region theory and its quantitative model. Carcinogenesis 6, 1–6 (1985)
Birkhoff, G.: Lattice Theory. Amer. Math. Soc., Providence (1979)
Grigoriev, P.A., Yevtushenko, S.A.: Elements of an Agile Discovery Environment. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds.) DS 2003. LNCS (LNAI), vol. 2843, pp. 309–316. Springer, Heidelberg (2003)
Ganter, B., Kuznetsov, S.: Formalizing Hypotheses with Concepts. In: Ganter, B., Mineau, G.W. (eds.) ICCS 2000. LNCS, vol. 1867, pp. 342–356. Springer, Heidelberg (2000)
Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools with Java Implementations. Morgan kaufmann, San Francisco (2000)
Cook, D., Holder, L.: Graph-Based Data Mining. IEEE Intelligent Systems 15, 32–41 (2000)
Cameron-Jones, R., Quinlan, J.: Efficient Top-down Induction of Logic Programs. SIGART Bulletin 5, 33–42 (1994)
Muggleton, S.: Inverse Entailment and Progol. New Generation Computing 13, 245–286 (1995)
Gonzalez, J., Holder, L., Cook, D.: Application of Graph-Based Concept Learning to the Predictive Toxicology Domain. In: Helma, C., King, R., Kramer, S., Srinvasan, A. (eds.) Proc. Workshop on Predictive Toxicology Challegnge at the 5th Conference on Data Mining and Knowledge Discovery, PKDD 2001 (2001), http://www.predictive-toxicology.org/ptc/
Blinova, V.G., Dobrynin, D.A., Zholdakova, Z.I., Kharchevnikova, N.V.: Studies on the structure-activity relationships of alcohols by means of the JSM-method. Nauch. Tekh. Inf., ser. 2, 13–18 (2001) (in Russian)
Guilian, W., Naibin, B.: Structure-activity relationships for rat and mouse LD50 of miscellaneous alcohols. Chemosphere 35, 1475–1483 (1998)
Woo, Y.T., Lai, D., McLain, J., et al.: Use of mechanism-based structure-activity relationships analysis in carcinogenic ranking for drinking water desinfection byproducts. Environ. Health Perspect, 75–87 (2002)
Kharchevnikova, N.V., Blinova, V.G., Dobrynin, D.A., Maksin, M.V., Zholdakova, Z.I.: Application of JSM-method and quantum-chemical computations for predicting of carcinogenic potential and chronic toxicity in halogen-substituted aliphatic hydrocarbons. Nauch. Tekh. Inf., ser. 2, 21–28 (2004) (in Russian)
Jerina, D., Lehr, R.: The bay-region theory: quantum mechanical approach to aromatic hydrocarbon-induced carcinogenecity. Microsomes and Drug Oxidation, pp. 709–720. Pergamon Press, Oxford (1977)
Dipple, A.: Polynuclear Aromatic Carcinogens. Number 172 in ACS Monograph. In: Chemical Carcinogens, pp. 245–314. Amer. Chem. Soc., Washington (1976)
Lowe, J., Silverman, B.: Mo theory of ease of formation of carbocations derived from nonalternant polycyclic aromatic hydrocarbons. J. Amer. Chem. Soc. 106, 5955–5958 (1984)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kuznetsov, S.O., Samokhin, M.V. (2005). Learning Closed Sets of Labeled Graphs for Chemical Applications. In: Kramer, S., Pfahringer, B. (eds) Inductive Logic Programming. ILP 2005. Lecture Notes in Computer Science(), vol 3625. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11536314_12
Download citation
DOI: https://doi.org/10.1007/11536314_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28177-1
Online ISBN: 978-3-540-31851-4
eBook Packages: Computer ScienceComputer Science (R0)