Skip to main content

A Hybrid Incremental Clustering Method-Combining Support Vector Machine and Enhanced Clustering by Committee Clustering Algorithm

  • Conference paper
  • 1824 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4426))

Abstract

In the study, a new hybrid incremental clustering method is proposed in combination with Support Vector Machine (SVM) and enhanced Clustering by Committee (CBC) algorithm. SVM classifies the incoming document to see if it belongs to the existing classes. Then the enhanced CBC algorithm is used to cluster the unclassified documents. SVM can significantly reduce the amount of calculation and the noise of clustering. The enhanced CBC algorithm can effectively control the number of clusters, improve performance and allow the number of classes to grow gradually based on the structure of current classes without clustering all of documents again. In empirical results, the proposed method outperforms the enhanced CBC clustering method and other algorithms. Also, the enhanced CBC clustering method outperforms original CBC.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. MacQueen, J.B.: Some Methods for Classification and Analysis of Multivariate Observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)

    Google Scholar 

  2. Ester, M., et al.: Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proceedings of the Knowledge Discovery and Data Mining, pp. 226–231 (1996)

    Google Scholar 

  3. Dash, M., Liu, H., Xu, X.: Merging Distance and Density based Clustering. In: Proceedings of the Database Systems for Advanced Applications, pp. 18–20 (2001)

    Google Scholar 

  4. Karypis, G., Han, E.-H., Kumar, V.: Hierarchical Clustering Using Dynamic Modeling. IEEE Computer 32, 68–75 (1999)

    Google Scholar 

  5. Pantel, P., Lin, D.: Document Clustering with Committees. In: Proceedings of the 25th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 199–206. ACM Press, New York (2002)

    Chapter  Google Scholar 

  6. Davidov, D., Gabrilovich, E., Markovitch, S.: Parameterized Generation of Labeled Datasets for Text Categorization based on a Hierarchical Directory. In: Proceedings of the 27th Annual International ACM SIGIR, pp. 250–257. ACM Press, New York (2004)

    Google Scholar 

  7. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)

    MATH  Google Scholar 

  8. Vats, N., Skillicorn, D.B.: Information Discovery within Organizations Using the Athens System. In: Proceedings of the 2004 Conference of the Center for Advanced Studies on Collaborative Research, pp. 282–292 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Zhi-Hua Zhou Hang Li Qiang Yang

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Chiu, DY., Hsieh, KL. (2007). A Hybrid Incremental Clustering Method-Combining Support Vector Machine and Enhanced Clustering by Committee Clustering Algorithm. In: Zhou, ZH., Li, H., Yang, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4426. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71701-0_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71701-0_47

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71700-3

  • Online ISBN: 978-3-540-71701-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics