Abstract
This work deals with determination of meaningful and terse cluster labels for News document clusters. We analyze a number of alternatives for selecting headlines and/or sentences of document in a document cluster (obtained as a result of an entity-event-duration query), and formalize an approach to extracting a short phrase from well-supported headlines/sentences of the cluster that can serve as the cluster label. Our technique maps a sentence into a set of significant stems to approximate its semantics, for comparison. Eventually a cluster label is extracted from a selected headline/sentence as a contiguous sequence of words, resuscitating word sequencing information lost in the formalization of semantic equivalence.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alonso, O., Gertz, M.: Clustering of Search Results using Temporal Attributes. In: Proceedings of 29th ACM SIGIR Conference, pp. 597–598 (2006)
Beil, F., Ester, F.M., Xu, X.: Frequent term-based text clustering. In: Proceedings of the 8th International Conference on Knowledge Discovery and Data Mining (KDD-2002), pp. 436–442 (2002)
Campos, R., Dias, G.: Automatic Hierarchical Clustering of Web Pages. In: Proceedings of the ELECTRA Workshop with 28th ACM SIGIR Conference, pp. 83–85 (2005)
Del Corso, G.M., Gulli, A., Romani, F.: Ranking a stream of news. In: Proceedings of 14th International World Wide Web Conference, Chiba, Japan, pp. 97–106 (2005)
Dunlavy, D., Conroy, J., O’Leary, D.: QCS: A Tool for Querying, Clsutering, and Summarizing Documents. In: Proceedings of HLT-NAACL, pp. 11–12 (2003)
Fung, B.C.M., Wang, K., Ester, M.: Hierarchical document clustering. In: Wang, J., (ed.) Encyclopedia of Data Warehousing and Mining, Idea Group (2005)
Ferragina, P., Gulli, A.: The anatomy of a hierarchical clustering engine for web-page, news and book snippets. In: Proceedings of the 4th IEEE International Conference on Data Mining, pp. 395–398 (2004)
Ferragina, P., Gulli, A.: A personalized search engine based on web-snippet hierarchical clustering. In: Proceedings of 14th International World Wide Web Conference, pp. 801–810 (2005)
Gulli, A.: The anatomy of a news search engine. In: Proceedings of 14th International World Wide Web Conference, pp. 880–881 (2005)
Gao, B.J., Ester, M.: Cluster description formats, problems and algorithms. In: Proceedings of the 6th SIAM Conference on Data Mining (2006)
Osinski, S., Weiss, D.: A concept-driven algorithm for clustering search results. IEEE Intelligent Systems 20, 48–54 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Thirunarayan, K., Immaneni, T., Shaik, M.V. (2007). Selecting Labels for News Document Clusters. In: Kedad, Z., Lammari, N., Métais, E., Meziane, F., Rezgui, Y. (eds) Natural Language Processing and Information Systems. NLDB 2007. Lecture Notes in Computer Science, vol 4592. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73351-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-73351-5_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73350-8
Online ISBN: 978-3-540-73351-5
eBook Packages: Computer ScienceComputer Science (R0)