Skip to main content

Extending K-Means Clustering to First-Order Representations

  • Conference paper
  • First Online:
Book cover Inductive Logic Programming (ILP 2000)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1866))

Included in the following conference series:

Abstract

In this paper, we present an in-depth evaluation of two approaches of extending k-means clustering to work on first-order representations. The first-approach, k-medoids, selects its cluster center from the given set of instances, and is thus limited in its choice of centers. The second approach, k-prototypes, uses a heuristic prototype construction algorithm that is capable of generating new centers. The two approaches are empirically evaluated on a standard benchmark problem with respect to clustering quality and convergence. Results show that in this case indeed the k-medoids approach is a viable and fast alternative to existing agglomerative or top-down clustering approaches even for a small-scale dataset, while k-prototypes exhibited a number of deficiencies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. G. H. Ball and D. J. Hall. A clustering technique for summarizing multivariate data. Behavioral Science, 12:153–157, 1967.

    Article  Google Scholar 

  2. G. Bisson. Conceptual Clustering in a First-Order Logic Representation. In B. Neumann, editor, Proceedings of the 10th European Conference on Artificial Intelligence, pages 458–462. John Wiley, 1992.

    Google Scholar 

  3. H. Blockeel, L. De Raedt, and J. Ramon. Top-down induction of clustering trees. In J. Shavlik, editor, Proceedings of the Fiteenth International Conference on Machine Learning (ICML-98), pages 55–63. Morgan Kaufmann, 1998.

    Google Scholar 

  4. D. Cox. Note on grouping. J. Am. Stat. Assoc., 52:543–547, 1957.

    Article  MATH  Google Scholar 

  5. W. Emde. Inductive learning of characteristic concept descriptions from small sets to classified examples. In F. Bergadano and L. De Raedt, editors, Proceedings of the 7th European Conference on Machine Learning, volume 784 of Lecture Notes in Artificial Intelligence, pages 103–121. Springer-Verlag, 1994.

    Google Scholar 

  6. W. Emde and D. Wettschereck. Relational Instance-Based Learning. In L. Saitta, editor, Proceedings of the 13th International Conference on Machine Learning, pages 122–130. Morgan Kaufmann, 1996.

    Google Scholar 

  7. U. M. Fayyad, C. Rain, and P. S. Bradley. Initialization of iterative refinement clustering algorithms. In R. Agrawal, P. E. Stolorz, and G. Piatetsk-Shapiro, editors, Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), pages 194–198. AAAI Press, 1998.

    Google Scholar 

  8. S. K. Gupta, K. Sambasiva Rao, and V. Bhatnagar. K-means Clustering Algorithm for Categorical Attributes. In M. K. Mohania and A. Min Tjoa, editors, Proceedings of the First International Conference on Data Warehousing and Knowledge Discovery (DaWaK-99), volume 1676 of Lecture Notes in Computer Science, pages 203–208. Springer-Verlag, 1999.

    Google Scholar 

  9. T. Horvàth, S. Wrobel, and U. Bohnebeck. Relational instance-based learning with lists and terms. Machine Learning (to appear).

    Google Scholar 

  10. T. Horvàth, S. Wrobel, and U. Bohnebeck. Term comparisons in first-order similarity measures. In D. Page, editor, Proceedings of the 8th International Conference on Inductive Logic Programming, volume 1446 of LNAI, pages 65–79. Springer-Verlag, 1998.

    Google Scholar 

  11. Z. Huang. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, 2(3):283–304, 1998.

    Article  Google Scholar 

  12. A. Hutchinson. Metrics on Terms and Clauses. In M. Someren and G. Widmer, editors, Proceedings of the 9th European Conference on Machine Learning, volume 1224 of LNAI, pages 138–145. Springer-Verlag, 1997.

    Google Scholar 

  13. L. Kaufmann and P. J. Rousseeuw. Clustering by means of medoids. In Y. Dodge, editor, Statistical Data Analysis based on the L1 Norm, pages 405-416. Elsevier Science Publishers, 1987.

    Google Scholar 

  14. L. Kaufmann and P. J. Rousseeuw. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley, 1990.

    Google Scholar 

  15. M. Kirsten and S. Wrobel. Relational distance-based clustering. In D. Page, editor, Proceedings of the 8th International Conference on Inductive Logic Programming, volume 1446 of LNAI, pages 261–270. Springer-Verlag, 1998.

    Google Scholar 

  16. J. McQueen. Some methods of classification and analysis of multivariate observations. In L. K. Le Cam and J. Neyman, editors, Proceedings of Fifth Berkeley Symposium on Mathematical Statistics and Probability, pages 281–293, 1967.

    Google Scholar 

  17. S. Muggleton. Inverse entailment and Progol. New Generation Computing, 13:245–286, 1995.

    Article  Google Scholar 

  18. S.-H. Nienhuys-Cheng. Distance Between Herbrand Interpretations: A Measure for Approximations to a Target Concept. In N. Lavrač and S. Džeroski, editors, Proceedings of the 7th International Workshop on Inductive Logic Programming, volume 1297 of LNAI, pages 213–226. Springer-Verlag, 1997.

    Google Scholar 

  19. G. Plotkin. A note on inductive generalization. In Machine Intelligence, volume 5, pages 153–163. Edinburgh University Press, 1970.

    MathSciNet  Google Scholar 

  20. J. Quinlan and R. Cameron-Jones. FOIL: A midterm report. In P. Brazdil, editor, Proceedings of the 6th European Conference on Machine Learning, volume 667 of Lecture Notes in Artificial Intelligence, pages 3–20. Springer-Verlag, 1993.

    Google Scholar 

  21. J. Ramon and M. Bruynooghe. A framework for defining distances between firstorder logic objects. In D. Page, editor, Proceedings of the 8th International Conference on Inductive Logic Programming, volume 1446 of Lecture Notes in Artificial Intelligence, pages 271–280. Springer-Verlag, 1998.

    Google Scholar 

  22. M. Sebag. Distance induction in first order logic. In N. Lavrač and S. Džeroski, editors, Proceedings of the 7th International Workshop on Inductive Logic Programming, LNAI, pages 264–272. Springer-Verlag, 1997.

    Google Scholar 

  23. A. Srinivasan, S. Muggleton, and R. King. Comparing the use of background knowledge by inductive logic programming systems. In L. De Raedt, editor, Proceedings of the 5th International Workshop on Inductive Logic Programming, pages 199–230. Department of Computer Science, Katholieke Universiteit Leuven, 1995.

    Google Scholar 

  24. A. Srinivasan, S. Muggleton, M. Sternberg, and R. King. Theories for mutagenicity: a study in first-order and feature-based induction. Artificial Intelligence, 85:277–299, 1996.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kirsten, M., Wrobel, S. (2000). Extending K-Means Clustering to First-Order Representations. In: Cussens, J., Frisch, A. (eds) Inductive Logic Programming. ILP 2000. Lecture Notes in Computer Science(), vol 1866. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44960-4_7

Download citation

  • DOI: https://doi.org/10.1007/3-540-44960-4_7

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67795-6

  • Online ISBN: 978-3-540-44960-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics