Skip to main content

ECCLAT: a New Approach of Clusters Discovery in Categorical Data

  • Conference paper
Research and Development in Intelligent Systems XIX

Abstract

In this paper we present a new approach for the discovery of meaningful clusters from large categorical data (which is an usual situation, e.g., web data analysis). Our method called ECCLAT (for Extraction of Clusters from Concepts LATtice) extracts a subset of concepts from the frequent closed itemsets lattice, using an evaluation measure. ECCLAT is generic because it allows to build approximate clustering and discover meaningful clusters with slight overlapping. The approach is illustrated on a classical data set and on web data analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R. & Imielinski, T. & Swami, A. Mining Association Rules between Sets of Items in Large Database. In Proceedings of ACM SIGMOD 93, pages 207–216, ACM Press, 1993

    Chapter  Google Scholar 

  2. Agrawal, R. & Srikant, R. Fast Algorithms for Mining Association Rules. In Proceedings of the 20th VLDB Conference, Santiago, Chile, 1994

    Google Scholar 

  3. Boulicaut, J.F. & Bykowski, A. Frequent Closures as a Concise Representation for Binary Data Mining. In Proceedings of the Fourth Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 00, volume 1805 of LNAI, pages 62–73, Kyoto, Japan, 2000. Springer-Verlag

    Google Scholar 

  4. Carpineto, C. & Romano, G. Galois: An Order-Theoretic Approach to Conceptual Clustering. In Proceedings of the 10th International Conference on Machine Learning, ICML 93, pages 33 – 40, Amherst, USA, June 1993

    Google Scholar 

  5. Crémilleux, B. & Gaio, M. & Madelaine, J. & Zreik, K. Discovering Browsing Paths on the Web. In Proceedings of the 7th International Conference on Human-System Learning, pages 9–18, Paris, France, 2000. Europia

    Google Scholar 

  6. Das, G. & Mannila, H. Context-based Similarity Measures for Categorical Databases. In Proceedings of the Fourth European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD 00, volume 1910 of LNAI, pages 201–210, Lyon, France, 2000. Springer-Verlag

    Google Scholar 

  7. Durand, N. & Crémilleux, B. & Henry-Amar, M. Discovering Associations in Clinical Data: Application to Search for Prognostic Factors in Hodgkin’s Disease. In Proceedings of the 8th Conference on Artificial Intelligence in Medecine in Europe, AIMS 01, volume 2101 of LNAI, pages 50–54, Cascais, Portugal, July 2001. Springer-Verlag

    Google Scholar 

  8. Durand, N. & Lancieri, L. Study of the Regularity of the Users’ Internet Accesses. In Proceedings of the 3rd International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 02, volume 2412 of LNCS, pages 173–178, Manchester, UK, August 2002. Springer-Verlag

    Google Scholar 

  9. Fisher, D.H. Knowledge Acquisition via Incremental Conceptual Clustering. Machine Learning, 2: 139 – 172, 1987

    Google Scholar 

  10. Ganti, V. & Gehrke, J. & Ramakrishnan, R. CACTUS: Clustering Categorical Data Using Summaries. In Proceedings of the 5th ACM SIGMOD International Conference on Knownledge Discovery and Data Mining, pages 73–83, New-York, August 1999

    Chapter  Google Scholar 

  11. Godin, R. & Missaoui, R. & Alaoui, H. Incremental Concept Formation Algorithms based on Galois (concept) Lattices. Computational Intelligence, 11(2):246–267, 1995

    Article  Google Scholar 

  12. Guha, S. & Rastogi, R. & Shim, K. ROCK: A Robust Clustering Algorithm for Categorical Attributes. In Proceedings of the 15th International Conference on IEEE Data Engineering, ICDE 99, pages 512–521, 1999

    Google Scholar 

  13. Han, E.H. & Karypis, G. & Kumar, V. & Mobasher, B. Hypergraph Based Clustering in High-Dimensional Data Sets: a Summary of Results. Bulletin of the Technical Commitee on Data Engineering, 21(l), 1998

    MATH  Google Scholar 

  14. Huang, Z. Extensions to the K-Means Algorithm for Clustering Large Data Sets with Categorical Values. Data Mining and Knwnledge Discovery, 2 (3): 283 – 304, 1999

    Article  Google Scholar 

  15. Jain, A.K. & Dubes, R.C. Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs, New Jersey, 1988

    MATH  Google Scholar 

  16. Karypis, G. & Han, E.H.S. Concept Indexing: a Fast Dimensionality Reduction Algorithm with Applications to Document Retrieval and Categorization. Technical Report TR-00-0016, University of Minnesota, 2000

    Google Scholar 

  17. Legouix, S. & Foucault, J.P. & Lancieri, L. A Method for Studying the Variability of Users’ Thematic Profile. In Proceedings of WebNet2000, Association for the Advancement of Computing in Education (AACE), San Antonio, 2000

    Google Scholar 

  18. Liu, W.Z. & White, A.P. Metrics for Nearest Neighbour Discrimination with Categorical Attributes. In Proceedings of the Seventh International Annual International Conference of the British Computer Society Specialist Group on Expert Systems (ES 97), Cambridge, UK, 1997

    Google Scholar 

  19. Michalski, R.S. & Stepp, R.E. Learning from Observation: Conceptual Clustering. In Michalski, R. S. & Carbonell, J. G. & Mitchell, T. M. (eds), Machine Learning, An Artificial Intelligence Approach, volume 1, pages 331–363. Morgan Kauffmann, 1983

    Google Scholar 

  20. Pasquier, N. & Bastide, Y. & Taouil, R. & Lakhal, L. Efficient Mining of Association Rules Using Closed Itemset Lattices. Information Systems, 24(l):25–46, Elsevier, 1999

    Article  Google Scholar 

  21. Ronkainen, P. Attribute Similarity and Event Sequence Similarity in Data Mining. Technical Report C-1998-42, University of Helsinki, October 1998

    Google Scholar 

  22. Stumme, G. & Taouil, R. & Bastide, Y. & Pasquier, N. & Lakhal, L. Computing Iceberg Concept Lattices with TITANIC. Journal on Knowledge and Data Engineering, 2002

    Google Scholar 

  23. Wang, J. A Survey of Web Caching Schemes for the Internet. ACM Computer Communication Review, 29(5):36–46, October 1999

    Article  Google Scholar 

  24. Wang, K. & Chu, X. & Liu, B. Clustering Transactions Using Large Items. In Proceedings of the ACM Conference on Information and Knowledge Management, CIKM 99, USA, 1999

    Google Scholar 

  25. Wille, R. Ordered Sets, chapter Restructuring Lattice Theory: an Approach based on Hierachies of Concepts, pages 445–470. Reidel, Dordrecht, 1982

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag London Limited

About this paper

Cite this paper

Durand, N., Crémilleux, B. (2003). ECCLAT: a New Approach of Clusters Discovery in Categorical Data. In: Bramer, M., Preece, A., Coenen, F. (eds) Research and Development in Intelligent Systems XIX. Springer, London. https://doi.org/10.1007/978-1-4471-0651-7_13

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-0651-7_13

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-85233-674-5

  • Online ISBN: 978-1-4471-0651-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics