Skip to main content

A Fast Spectral Clustering Method Based on Growing Vector Quantization for Large Data Sets

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8347))

Abstract

Spectral clustering is a flexible clustering algorithm that can produce high-quality clusters on small scale data sets, but it is limited applicable to large scale data sets because it needs O(n 3) computational operations to process a data set of n data points[1]. Based on the minimization of the increment of distortion, we tackle this problem by developing a novel efficient growing vector quantization method to preprocess a large scale data set, which can compress the original data set into a small set of representative data points in one scan of the original data set. Then we apply spectral clustering algorithm to the small set. Experiments on real data sets show that our method provides fast and accurate clustering results.

This work is supported by National Natural Science Foundation of China under Grant No. 61003311, Jiangsu Provincial Key Laboratory of Network and Information Security Grant No. BM2003201-201006, Anhui Provincial Natural Science Research Key Project of China under Grant No. KJ2011A040, Youth Foundation of Anhui University of Technology under Grant No. QZ201316.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Yan, D., Huang, L., Jordan, M.I.: Fast approximate spectral clustering. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 907–916. ACM (2009)

    Chapter  Google Scholar 

  2. Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Proceedings of the Advances in Neural Information Processing Systems, pp. 849–856. MIT Press, Cambridge (2002)

    Google Scholar 

  3. Equitz, W.H.: A new vector quantization clustering algorithm. IEEE Transactions on Acoustics, Speech and Signal Processing 37(10), 1568–1575 (1989)

    Article  Google Scholar 

  4. Vasuki, A., Vanathi, P.T.: A review of vector quantization techniques. IEEE Potentials 25(4), 39–47 (2006)

    Article  Google Scholar 

  5. Lewis, D.D., Yang, Y., Rose, T.G., et al.: Rcv1: A new benchmark collection for text categorization research. The Journal of Machine Learning Research 5, 361–397 (2004)

    Google Scholar 

  6. Chen, W.Y., Song, Y., Bai, H., et al.: Parallel spectral clustering in distributed systems. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(3), 568–586 (2011)

    Article  Google Scholar 

  7. Zelnik-Manor, L., Perona, P.: Self-tuning spectral clustering. In: Advances in Neural Information Processing Systems, pp. 1601–1608 (2004)

    Google Scholar 

  8. Von Luxburg, U.: A tutorial on spectral clustering. Statistics and Computing 17(4), 395–416 (2007)

    Article  MathSciNet  Google Scholar 

  9. Wauthier, F.L., Jojic, N., Jordan, M.I.: Active spectral clustering via iterative uncertainty reduction. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1339–1347. ACM (2012)

    Google Scholar 

  10. Ochs, P., Brox, T.: Higher order motion models and spectral clustering. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 614–621. IEEE (2012)

    Google Scholar 

  11. Pagès, G., Wilbertz, B.: Intrinsic stationarity for vector quantization: Foundation of dual quantization. SIAM Journal on Numerical Analysis 50(2), 747–780 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  12. Kästner, M., Hammer, B., Biehl, M., et al.: Functional relevance learning in generalized learning vector quantization. Neurocomputing 90, 85–95 (2012)

    Article  Google Scholar 

  13. Dasgupta, S., Freund, Y.: Random projection trees for vector quantization. IEEE Transactions on Information Theory 55(7), 3229–3242 (2009)

    Article  MathSciNet  Google Scholar 

  14. Qian, P., Chung, F.L., Wang, S., et al.: Fast Graph-Based Relaxed Clustering for Large Data Sets Using Minimal Enclosing Ball. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 42(3), 672–687 (2012)

    Article  Google Scholar 

  15. Deng, Z., Choi, K.S., Chung, F.L., et al.: Enhanced soft subspace clustering integrating within-cluster and between-cluster information. Pattern Recognition 43(3), 767–781 (2010)

    Article  MATH  Google Scholar 

  16. Wang, S., Chung, K.F.-L., Deng, Z., Hu, D., Wu, X.: Robust maximum entropy clustering algorithm with its labeling for outliers. Soft Comput. 10(7), 555–563 (2006)

    Article  MATH  Google Scholar 

  17. Wang, S., Chung, K.F.-L., Deng, Z., Hu, D.: Robust fuzzy clustering neural network based on epsilon-insensitive loss function. Appl. Soft Comput. 7(2), 577–584 (2007)

    Article  Google Scholar 

  18. Lee, C.H., Zaïane, O.R., Park, H.H., et al.: Clustering high dimensional data: A graph-based relaxed optimization approach. Information Sciences 178(23), 4501–4511 (2008)

    Article  MathSciNet  Google Scholar 

  19. Cao, J., Wu, Z., Wu, J., et al.: SAIL: Summation-bAsed Incremental Learning for Information-Theoretic Text Clustering (2013)

    Google Scholar 

  20. Cao, J., Wu, Z., Wu, J., et al.: Towards information-theoretic K-means clustering for image indexing. Signal Processing (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, X., Zheng, X., Qin, F., Zhao, B. (2013). A Fast Spectral Clustering Method Based on Growing Vector Quantization for Large Data Sets. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8347. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53917-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-53917-6_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-53916-9

  • Online ISBN: 978-3-642-53917-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics