Abstract
“Traditional” clustering, in broad sense, aims at organizing objects into groups (clusters) whose members are “similar” among them and are “dissimilar” to objects belonging to the other groups. In contrast, in conceptual clustering the underlying structure of the data together with the description language which is available to the learner is what drives cluster formation, thus providing intelligible descriptions of the clusters, facilitating their interpretation.
We present a novel conceptual clustering system for multi-relational data, based on the popular k − medoids algorithm. Although clustering is, generally, not straightforward to evaluate, experimental results on several applications show promising results. Clusters generated without class information agree very well with the true class labels of cluster’s members. Moreover, it was possible to obtain intelligible and meaningful descriptions of the clusters.
Keywords
- Inductive Logic Programming
- Conceptual Cluster
- Subgroup Discovery
- Inductive Logic Programming System
- Bottom Clause
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Anderson, G., Pfahringer, B.: Clustering Relational Data Based on Randomized Propositionalization. In: Blockeel, H., Ramon, J., Shavlik, J., Tadepalli, P. (eds.) ILP 2007. LNCS (LNAI), vol. 4894, pp. 39–48. Springer, Heidelberg (2008)
Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. ACM Trans. Knowl. Discov. Data 1 (March 2007)
Bisson, G.: Conceptual clustering in a first order logic representation. In: ECAI 1992: Proceedings of the 10th European Conference on Artificial Intelligence, pp. 458–462. John Wiley & Sons, Inc., New York (1992)
Camacho, R., Fonseca, N.A., Rocha, R., Santos Costa, V.: ILP:- Just Trie It. In: Blockeel, H., Ramon, J., Shavlik, J., Tadepalli, P. (eds.) ILP 2007. LNCS (LNAI), vol. 4894, pp. 78–87. Springer, Heidelberg (2008)
Davis, J., Burnside, E., de Castro Dutra, I., Page, D., Santos Costa, V.: An Integrated Approach to Learning Bayesian Networks of Rules. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 84–95. Springer, Heidelberg (2005)
Džeroski, S., Lavrač, N.: Learning relations from noisy examples: An empirical comparison of LINUS and FOIL. In: International Workshop on Machine Learning, pp. 399–402. Morgan Kaufmann (1991)
Emde, W., Wettschereck, D.: Relational instance based learning. In: Proceedings 13th ICML, pp. 122–130. Morgan Kaufmann Publishers (1996)
Emde, W., Wettschereck, D.: Relational instance based learning. In: Saitta, L. (ed.) Machine Learning - Proceedings 13th International Conference on Machine Learning, pp. 122–130. Morgan Kaufmann Publishers (1996)
Fonseca, N.A., Camacho, R., Rocha, R., Santos Costa, V.: Compile the hypothesis space: do it once, use it often. Fundamenta Informaticae, Special Issue on Multi-Relational Data Mining 89, 45–67 (2008)
Fonseca, N.A., Rocha, R., Camacho, R., Santos Costa, V.: ILP: Compute once, reuse often. In: 6th Workshop on Multi-Relational Data Mining, MRDM 2007 (2007)
Fonseca, N.A., Rocha, R., Camacho, R., Santos Costa, V.: K-RNN: k-relational neareast neighbour algorithm. In: 23rd Annual ACM Symposium on Applied Computing, SAC 2008 (2008)
Gärtner, T., Lloyd, J.W., Flach, P.A.: Kernels and distances for structured data. Machine Learning 57(3), 205–232 (2004)
Hand, D.J., Smyth, P., Mannila, H.: Principles of data mining. MIT Press, Cambridge (2001)
Handl, J., Knowles, J., Kell, D.B.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21(15), 3201–3212 (2005)
Hathaway, R.J., Bezdek, J.C.: Visual cluster validity for prototype generator clustering models. Pattern Recogn. Lett. 24(9-10), 1563–1569 (2003)
Horvath, T., Wrobel, S., Bohnebeck, U.: Relational instance-based learning with lists and terms. Machine Learning 43(1/2), 53–80 (2001)
Kirsten, M., Wrabel, S., Horváth, T.: Distance based approaches to relational learning and clustering, pp. 213–230 (2000)
Kirsten, M., Wrobel, S., Horvath, T.: Distance based approaches to relational learning and clustering. In: Džeroski, S., Lavrač, N. (eds.) Relational Data Mining, pp. 213–232. Springer (September 2001)
Kok, S., Domingos, P.: Learning the structure of markov logic networks. In: De Raedt, L., Wrobel, S. (eds.) ICML. ACM International Conference Proceeding Series, vol. 119, pp. 441–448. ACM (2005)
Kok, S., Domingos, P.: Extracting Semantic Networks from Text Via Relational Clustering. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part I. LNCS (LNAI), vol. 5211, pp. 624–639. Springer, Heidelberg (2008)
Kramer, S., Lavrac, N., Flach, P.: Propositionalization approaches to relational data mining. In: Dzeroski, S., Lavraç, N. (eds.) Relational Data Mining, pp. 262–286. Springer New York Inc., New York (2001)
Landwehr, N., Kersting, K., De Raedt, L.: nFOIL: Integrating Naïve Bayes and FOIL. In: National Conference on Artificial Intelligence, pp. 795–800 (2005)
Landwehr, N., Passerini, A., De Raedt, L., Frasconi, P.: kFOIL: Learning simple relational kernels. In: AAAI (2006)
Lipkus, A.H.: A proof of the triangle inequality for the tanimoto distance. Journal of Mathematical Chemistry 26(1-3), 263–265 (1999)
Michalski, R.S., Stepp, R.E.: Automated construction of classifications: Conceptual clustering versus numerical taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence 5(4), 396–409 (1983)
Muggleton, S.: Inverse entailment and Progol. New Generation Computing, Special issue on Inductive Logic Programming 13(3-4), 245–286 (1995)
Muggleton, S., De Raedt, L.: Inductive logic programming: Theory and methods. JLP 19/20, 629–679 (1994)
Passerini, A., Frasconi, P., De Raedt, L.: Kernels on prolog proof trees: Statistical learning in the ILP setting. Journal of Machine Learning Research 7, 307–342 (2006)
De Raedt, L., Blockeel, H.: Using logical decision trees for clustering. In: Džeroski, S., Lavrač, N. (eds.) ILP 1997. LNCS, vol. 1297, pp. 133–140. Springer, Heidelberg (1997)
Ralaivola, L., Swamidass, S.J., Saigo, H., Baldi, P.: Graph kernels for chemical informatics. Neural Netw. 18(8), 1093–1110 (2005)
Ramon, J., Bruynooghe, M.: A Framework for Defining Distances between First-Order Logic Objects. In: Page, D.L. (ed.) ILP 1998. LNCS, vol. 1446, pp. 271–280. Springer, Heidelberg (1998)
Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20(1), 53–65 (1987)
Sander, J., Ester, M., Kriegel, H.-P., Xu, X.: Density-based clustering in spatial databases: The algorithm gdbscan and its applications. Data Min. Knowl. Discov. 2(2), 169–194 (1998)
Sebag, M.: Distance Induction in First Order Logic. In: Džeroski, S., Lavrač, N. (eds.) ILP 1997. LNCS, vol. 1297, pp. 264–272. Springer, Heidelberg (1997)
Srinivasan, A., King, R.D., Muggleton, S., Sternberg, M.J.E.: Carcinogenesis Predictions using ILP. In: Džeroski, S., Lavrač, N. (eds.) ILP 1997. LNCS, vol. 1297, pp. 273–287. Springer, Heidelberg (1997)
Yamamoto, A.: Which Hypotheses can be Found with Inverse Entailment? In: Džeroski, S., Lavrač, N. (eds.) ILP 1997. LNCS, vol. 1297, pp. 296–308. Springer, Heidelberg (1997)
Yin, X., Han, J., Yu, P.S.: Cross-relational clustering with user’s guidance. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, KDD 2005, pp. 344–353. ACM, New York (2005)
Zelezný, F., Lavrač, N.: Propositionalization-based relational subgroup discovery with RSD. Machine Learning 62(1-2), 33–63 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fonseca, N.A., Santos Costa, V., Camacho, R. (2012). Conceptual Clustering of Multi-Relational Data. In: Muggleton, S.H., Tamaddoni-Nezhad, A., Lisi, F.A. (eds) Inductive Logic Programming. ILP 2011. Lecture Notes in Computer Science(), vol 7207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31951-8_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-31951-8_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31950-1
Online ISBN: 978-3-642-31951-8
eBook Packages: Computer ScienceComputer Science (R0)