Improving the Dynamic Hierarchical Compact Clustering Algorithm by Using Feature Selection

  • Reynaldo Gil-García
  • Aurora Pons-Porrata
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6419)

Abstract

Feature selection has improved the performance of text clustering. In this paper, a local feature selection technique is incorporated in the dynamic hierarchical compact clustering algorithm to speed up the computation of similarities. We also present a quality measure to evaluate hierarchical clustering that considers the cost of finding the optimal cluster from the root. The experimental results on several benchmark text collections show that the proposed method is faster than the original algorithm while achieving approximately the same clustering quality.

Keywords

Feature Selection Cluster Quality Optimal Cluster Document Cluster Hierarchical Cluster Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Gil-García, R., Badía-Contelles, J.M., Pons-Porrata, A.: Dynamic Hierarchical Compact Clustering Algorithm. In: Sanfeliu, A., Cortés, M.L. (eds.) CIARP 2005. LNCS, vol. 3773, pp. 302–310. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  2. 2.
    Dy, J.G., Brodley, C.E.: Feature selection for unsupervised learning. Journal of Machine Learning Research 5, 845–889 (2004)MathSciNetMATHGoogle Scholar
  3. 3.
    Ribeiro, M.N., Neto, M.J.R., Prudêncio, R.B.C.: Local feature selection in text clustering. In: Köppen, M., Kasabov, N., Coghill, G. (eds.) ICONIP 2008. LNCS, vol. 5507, pp. 45–52. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  4. 4.
    Zhao, Y., Karypis, G.: Evaluation of hierarchical clustering algorithms for document datasets. In: 11th CIKM, pp. 515–524. ACM Press, New York (2002)Google Scholar
  5. 5.
    Gil-García, R., Pons-Porrata, A.: Dynamic hierarchical algorithms for document clustering. Pattern Recognition Letters 31(6), 469–477 (2010)CrossRefGoogle Scholar
  6. 6.
    Larsen, B., Aone, C.: Fast and Effective Text Mining Using Linear-time Document Clustering. In: KDD 1999, pp. 16–22. ACM Press, New York (1999)Google Scholar
  7. 7.
    Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inform. Retrieval 12, 461–486 (2009)CrossRefGoogle Scholar
  8. 8.
    Allan, J., Feng, A., Bolivar, A.: Flexible intrinsic evaluation of hierarchical clustering for tdt. In: 12th CIKM, pp. 263–270. ACM Press, New York (2003)Google Scholar
  9. 9.
    Gil-García, R., Pons-Porrata, A.: A speed-up hierarchical compact clustering algorithm for dynamic document collections. In: Bayro-Corrochano, E., Eklundh, J.-O. (eds.) CIARP 2009. LNCS, vol. 5856, pp. 379–386. Springer, Heidelberg (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Reynaldo Gil-García
    • 1
  • Aurora Pons-Porrata
    • 1
  1. 1.Center for Pattern Recognition and Data MiningUniversidad de OrienteSantiago de CubaCuba

Personalised recommendations