Skip to main content

Cluster Summarization with Dense Region Detection

  • Conference paper
  • First Online:
Book cover Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2014)

Abstract

This paper introduces a new approach to summarize clusters by finding dense regions, and representing each cluster as a Gaussian Mixture Model (GMM). The GMM summarization allows us to summarize a cluster efficiently, then regenerate the original data with high accuracy. Unlike the classical representation of a cluster using a radius and a center, the proposed approach keeps information of the shape, as well as distributions of the samples in the clusters. Considering the GMM as a parametric model (number of Gaussian mixtures in each GMM), we propose a method to find number of Gaussian mixtures automatically. Each GMM is able to summarize a cluster generated by any kind of clustering algorithms and regenerate the original data with high accuracy. Moreover, when a new sample is presented to the GMMs of clusters, a membership value is calculated for each cluster. Then, using the membership values, the new incoming sample is assigned to the closest cluster. Employing the GMMs to summarize clusters offers several advantages with regards to accuracy, detection rate, memory efficiency and time complexity. We evaluate the proposed method on a variety of datasets, both synthetic dataset and real datasets from the UCI repository. We examine the quality of the summarized clusters generated by the proposed method in terms of DUNN, DB, SD and SSD indexes, and compare them with that of the well-known ABACUS method. We also employ the proposed algorithm in anomaly detection applications, and study the performance of the proposed method in terms of false alarm and detection rates, and compare them with Negative Selection, Naïve models, and ABACUS. Furthermore, we evaluate the memory usage and processing time of the proposed algorithms with other algorithms. The results illustrate that our algorithm outperforms other well-known anomaly detection algorithms in terms of accuracy, detection rate, as well as memory usage and processing time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD

    Google Scholar 

  2. Wang, W., Yang, J., Muntz, R.R.: Sting: a statistical information grid approach to spatial data mining. San Francisco (1997)

    Google Scholar 

  3. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques: The Morgan Kaufmann Series in Data Management Systems, 3rd edn. Morgan Kaufmann Publishers, Burlington (2006)

    Google Scholar 

  4. MacQueen, B.J.: Some Methods for classification and Analysis of Multivariate Observations (1967)

    Google Scholar 

  5. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Upper Saddle River (1988)

    MATH  Google Scholar 

  6. Kaufman, L., Rousseeuw, J.P.: Clustering by means of Medoids, in Statistical Data Analysis Based on the L_1–Norm and Related Methods. Y. Dodge, North-Holland (1987)

    Google Scholar 

  7. Karypis, G., Han, H.E., Kumar, V.: CHAMELEON: a hierarchical clustering algorithm using dynamic modeling. IEEE Comput. 32(8), 68–75 (1999)

    Article  Google Scholar 

  8. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley, New York (1990)

    Book  MATH  Google Scholar 

  9. Agrawal, J., Gunopulos, D., Raghavan, P.: Automatic sub-space clustering of high dimensional data for data mining applications (1998)

    Google Scholar 

  10. Hinneburg, A., Keim, D.A.: An efficient approach to clustering in large multimedia databases with noise (1998)

    Google Scholar 

  11. Guha, S., Meyerson, A., Mishra, N., Motwani, R.: Clustering data streams: theory and practice. IEEE Trans. Knowl. Data Eng. 15(3), 505–528 (2003)

    Article  Google Scholar 

  12. Bifet, A., Holmes, G., Pfahringer, B.: New ensemble methods for evolving data streams (2009)

    Google Scholar 

  13. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams (2003)

    Google Scholar 

  14. Yang, D., Elke, A., Matthew, O.W.: Summarization and matching of density-based clusters in streaming environments. Proc. VLDB Endowment 5(2), 121–132 (2011)

    Article  Google Scholar 

  15. Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: SIAM Conference on Data Mining (2006)

    Google Scholar 

  16. Chaoji, V., Li, W., Yildirim, H., Zaki, M.: ABACUS: mining arbitrary shaped clusters from large datasets based on backbone identification. In: SIAM/Omnipress (2011)

    Google Scholar 

  17. He, Z., Xu, X., Deng, S.: Discovering cluster-based local outliers. Pattern Recogn. Lett. 24, 1641–1650 (2003)

    Article  MATH  Google Scholar 

  18. Gaddam, S., Phoha, V., Balagani, K.: K-means+ID3: a novel method for supervised anomaly detection by cascading k-means clustering and ID3 decision tree learning methods. IEEE Trans. Knowl. Data Eng. 19(3), 345–354 (2007)

    Article  Google Scholar 

  19. Mohammadi, M., Akbari, A., Raahemi, B., Nasersharif, B., Asgharian, H.: A fast anomaly detection system using probabilistic artificial immune algorithm capable of learning new attacks. Evol. Intel. 6(3), 135–156 (2014)

    Article  Google Scholar 

  20. Kersting, K., Wahabzada, M., Thurau, C., Bauckhage, C.: Hierarchical convex NMF for clustering massive data (2010)

    Google Scholar 

  21. Hershberger, J., Shrivastava, N., Suri, S.: Summarizing spatial data streams using ClusterHulls. J. Exp. Algorithmics (JEA) 13 (2009). doi:10.1145/1412228.1412238

  22. Mohammadi, M., Akbari, A., Raahemi, B., Nasersharif, B., Asgharian, H.: A fast anomaly detection system using probabilistic artificial immune algorithm capable of learning new attacks. Evol. Intel. 6(5), 135–156 (2014)

    Article  Google Scholar 

  23. Gaddam, S., Phoha, V., Balagani, K.: K-means+ID3: a novel method for supervised anomaly detection by cascading k-means clustering and ID3 decision tree learning methods. IEEE Trans. Knowl. Data Eng. 19(3), 345–354 (2007)

    Article  Google Scholar 

  24. Dunn, J.C.: Well separated clusters and optimal fuzzy partitions. Cybernetics 4, 95–104 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  25. Davies, L.D., Bouldin, W.D.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1(4), 224–227 (1979)

    Article  Google Scholar 

  26. Halkidi, M., Vazirgiannis, M., Batistakis, Y.: Quality scheme assessment in the clustering process. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 265–276. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  27. Sande, P.C., Monroe, J.G.: Negative selection of immature b cells by receptor editing or deletion is determined by site of antigen encounter. Immunity 10(3), 289–299 (1999)

    Article  Google Scholar 

Download references

Acknowledgement

This research was supported by NSERC Canada, Grant Nbr RGPIN/341811-2012.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elnaz Bigdeli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Bigdeli, E., Mohammadi, M., Raahemi, B., Matwin, S. (2015). Cluster Summarization with Dense Region Detection. In: Fred, A., Dietz, J., Aveiro, D., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2014. Communications in Computer and Information Science, vol 553. Springer, Cham. https://doi.org/10.1007/978-3-319-25840-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25840-9_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25839-3

  • Online ISBN: 978-3-319-25840-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics