Abstract
Document clustering methods can be used to structure large sets of text or hypertext documents. Suffix Tree Clustering has been proved to be a good approach for documents clustering. However, the cluster merging algorithm of Suffix Tree Clustering is based on the overlap of their document sets, which totally ignore the similarity between the non-overlap parts of different clusters. In this paper, we introduce a novel cluster merging approach which will combines the cosine similarity and overlap percentage. Using this method, we can get a better clustering result and a comparative small number of clusters.
Chapter PDF
Similar content being viewed by others
References
Liu B., Chin C. W., and Ng, H. T. Mining Topic-Specific Concepts and Definitions on the Web. In Proceedings of the Twelfth International World Wide Web Conference (WWW’03), Budapest, Hungary, 2003.
Cutting D.R., Karger D.R., Pedersen J.O., Tukey J.W. Scatter / Gather: A Cluster-based Approach to Browsing Large Document Collection, Proc. ACM SIGIR 92, 1992
Zamir O., Etzioni O. Web Document Clustering: A Feasibility Demonstration, In Proceedings of the 19th International ACM SIG1R Conference on Research and Development of Information Retrieval (SIGIR’98), 1998.
J.J. Rocchio, Document retrieval systems — optimization and evaluation, Ph.D. Thesis, Harvard University, 1966.
P. Willet. Recent trends in hierarchical document clustering: a critical review. Information Processing and Management, 24:577–97, 1988.
Leuski A. and Allan J. Improving Interactive Retrieval by Combining Ranked List and Clustering. Proceedings of RIAO, College de France, pp. 665–681, 2000.
Smith, D.A. Detecting and Browsing Events in Unstructured Text. In Proceedings of ACM/SIGIR’ 2002.
Sergey Brin, and Larry Page. The anatomy of a large scale hypertextual web search engine. In Proceedings of WWW7, Brisbane, Australia, April 1998.
Hua-Jun Zeng Qi-Cai He Zheng Chen Wei-Ying Ma Jinwen Ma Learning to cluster web search results SIGIR’04, July 25 29, Sheffield, South Yorkshire, UK, 2004
X. Shen, B. Tan, and C. Zhai. Intelligent search using implicit user model. Technical report, Department of Computer Science, University of Illinois at Urbana-Champaign, 2005.
Google search engine, http://www.google.com.
Yahoo search engine, http://www.yahoo.com
Ricardo Baeza-Yates. Berthier Ribeiro-Neto, Modern Information Retrieval, Addison Wesley Press, 1999
Ian.H. Written, Alistair Moffat, Timothy.C. Bell. Managing Gigabyte, Morgan Kaufmann publishing, 1999
P. Weiner. Linear pattern matching algorithms. In Proceedings of the 14th Annual Symposium on Foundations of Computer Science (FOCS), pages 1–11, 1973.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 International Federation for Information Processing
About this paper
Cite this paper
Wang, J., Li, R. (2006). A New Cluster Merging Algorithm of Suffix tree Clustering. In: Shi, Z., Shimohara, K., Feng, D. (eds) Intelligent Information Processing III. IIP 2006. IFIP International Federation for Information Processing, vol 228. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-44641-7_21
Download citation
DOI: https://doi.org/10.1007/978-0-387-44641-7_21
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-44639-4
Online ISBN: 978-0-387-44641-7
eBook Packages: Computer ScienceComputer Science (R0)