Selecting Labels for News Document Clusters

Thirunarayan, Krishnaprasad; Immaneni, Trivikram; Shaik, Mastan Vali

doi:10.1007/978-3-540-73351-5_11

Krishnaprasad Thirunarayan¹,
Trivikram Immaneni¹ &
Mastan Vali Shaik¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4592))

Included in the following conference series:

International Conference on Application of Natural Language to Information Systems

989 Accesses
2 Citations

Abstract

This work deals with determination of meaningful and terse cluster labels for News document clusters. We analyze a number of alternatives for selecting headlines and/or sentences of document in a document cluster (obtained as a result of an entity-event-duration query), and formalize an approach to extracting a short phrase from well-supported headlines/sentences of the cluster that can serve as the cluster label. Our technique maps a sentence into a set of significant stems to approximate its semantics, for comparison. Eventually a cluster label is extracted from a selected headline/sentence as a contiguous sequence of words, resuscitating word sequencing information lost in the formalization of semantic equivalence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alonso, O., Gertz, M.: Clustering of Search Results using Temporal Attributes. In: Proceedings of 29th ACM SIGIR Conference, pp. 597–598 (2006)
Google Scholar
Beil, F., Ester, F.M., Xu, X.: Frequent term-based text clustering. In: Proceedings of the 8th International Conference on Knowledge Discovery and Data Mining (KDD-2002), pp. 436–442 (2002)
Google Scholar
Campos, R., Dias, G.: Automatic Hierarchical Clustering of Web Pages. In: Proceedings of the ELECTRA Workshop with 28th ACM SIGIR Conference, pp. 83–85 (2005)
Google Scholar
Del Corso, G.M., Gulli, A., Romani, F.: Ranking a stream of news. In: Proceedings of 14th International World Wide Web Conference, Chiba, Japan, pp. 97–106 (2005)
Google Scholar
Dunlavy, D., Conroy, J., O’Leary, D.: QCS: A Tool for Querying, Clsutering, and Summarizing Documents. In: Proceedings of HLT-NAACL, pp. 11–12 (2003)
Google Scholar
Fung, B.C.M., Wang, K., Ester, M.: Hierarchical document clustering. In: Wang, J., (ed.) Encyclopedia of Data Warehousing and Mining, Idea Group (2005)
Google Scholar
Ferragina, P., Gulli, A.: The anatomy of a hierarchical clustering engine for web-page, news and book snippets. In: Proceedings of the 4th IEEE International Conference on Data Mining, pp. 395–398 (2004)
Google Scholar
Ferragina, P., Gulli, A.: A personalized search engine based on web-snippet hierarchical clustering. In: Proceedings of 14th International World Wide Web Conference, pp. 801–810 (2005)
Google Scholar
Gulli, A.: The anatomy of a news search engine. In: Proceedings of 14th International World Wide Web Conference, pp. 880–881 (2005)
Google Scholar
Gao, B.J., Ester, M.: Cluster description formats, problems and algorithms. In: Proceedings of the 6th SIAM Conference on Data Mining (2006)
Google Scholar
http://www.google.com/trends
http://www.lexisnexis.com/
http://www.jakarta.apache.org/lucene/docs/index.html
http://www.nlm.nih.gov/
Osinski, S., Weiss, D.: A concept-driven algorithm for clustering search results. IEEE Intelligent Systems 20, 48–54 (2005)
Article Google Scholar
http://www.vivisimo.com/docs/tagging.pdf

Download references

Author information

Authors and Affiliations

Metadata and Languages Laboratory, Department of Computer Science and Engineering, Wright State University, Dayton, Ohio-45435, USA
Krishnaprasad Thirunarayan, Trivikram Immaneni & Mastan Vali Shaik

Authors

Krishnaprasad Thirunarayan
View author publications
You can also search for this author in PubMed Google Scholar
Trivikram Immaneni
View author publications
You can also search for this author in PubMed Google Scholar
Mastan Vali Shaik
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Zoubida Kedad Nadira Lammari Elisabeth Métais Farid Meziane Yacine Rezgui

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Thirunarayan, K., Immaneni, T., Shaik, M.V. (2007). Selecting Labels for News Document Clusters. In: Kedad, Z., Lammari, N., Métais, E., Meziane, F., Rezgui, Y. (eds) Natural Language Processing and Information Systems. NLDB 2007. Lecture Notes in Computer Science, vol 4592. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73351-5_11

Download citation

DOI: https://doi.org/10.1007/978-3-540-73351-5_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73350-8
Online ISBN: 978-3-540-73351-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics