How Different Are Language Models andWord Clouds?

Kaptein, Rianne; Hiemstra, Djoerd; Kamps, Jaap

doi:10.1007/978-3-642-12275-0_48

Rianne Kaptein²⁴,
Djoerd Hiemstra²⁵ &
Jaap Kamps^24,26

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5993))

Included in the following conference series:

European Conference on Information Retrieval

2201 Accesses
8 Citations

Abstract

Word clouds are a summarised representation of a document’s text, similar to tag clouds which summarise the tags assigned to documents. Word clouds are similar to language models in the sense that they represent a document by its word distribution. In this paper we investigate the differences between word cloud and language modelling approaches, and specifically whether effective language modelling techniques also improve word clouds. We evaluate the quality of the language model using a system evaluation test bed, and evaluate the quality of the resulting word cloud with a user study. Our experiments show that different language modelling techniques can be applied to improve a standard word cloud that uses a TF weighting scheme in combination with stopword removal. Including bigrams in the word clouds and a parsimonious term weighting scheme are the most effective in both the system evaluation and the user study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bateman, S., Gutwin, C., Nacenta, M.: Seeing things in the clouds: the effect of visual features on tag cloud selections. In: Proceedings HT 2008, pp. 193–202. ACM, New York (2008)
Chapter Google Scholar
Brooks, C.H., Montanez, N.: Improved annotation of the blogosphere via autotagging and hierarchical clustering. In: Proceedings WWW 2006, pp. 625–632. ACM, New York (2006)
Chapter Google Scholar
Buckley, C., Robertson, S.: Relevance feedback track overview: TREC 2008. In: The Seventeenth Text REtrieval Conference (TREC 2008) Notebook (2008)
Google Scholar
Coupland, D.: Microserfs. HarperCollins, Toronto (1995)
Google Scholar
Dredze, M., Wallach, H.M., Puller, D., Pereira, F.: Generating summary keywords for emails using topics. In: Proceedings of the 2008 International Conference on Intelligent User Interfaces (2008)
Google Scholar
Halvey, M.J., Keane, M.T.: An assessment of tag presentation techniques. In: Proceedings WWW 2007, pp. 1313–1314. ACM, New York (2007)
Chapter Google Scholar
Harman, D.: How effective is suffixing? Journal of the American Society for Information Science 42, 7–15 (1991)
Article MathSciNet Google Scholar
Hiemstra, D., Robertson, S., Zaragoza, H.: Parsimonious language models for information retrieval. In: Proceedings SIGIR 2004, pp. 178–185. ACM Press, New York (2004)
Chapter Google Scholar
Kuo, B.Y.-L., Hentrich, T., Good, B.M., Wilkinson, M.D.: Tag clouds for summarizing web search results. In: Proceedings WWW 2007, pp. 1203–1204. ACM, New York (2007)
Chapter Google Scholar
Lambiotte, R., Ausloos, M.: Collaborative tagging as a tripartite network. In: Computational Science – ICCS 2006, pp. 1114–1117 (2006)
Google Scholar
LibraryThing (2009), http://www.librarything.com/
Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
MATH Google Scholar
ManyEyes (2009), http://manyeyes.alphaworks.ibm.com/manyeyes/page/Tag_Cloud.html
Maron, M.E., Kuhns, J.L.: On relevance, probabilistic indexing and information retrieval. Journal of the ACM 7(3), 216–244 (1960)
Article Google Scholar
Metzler, D., Croft, W.B.: Combining the language model and inference network approaches to retrieval. Information Processing & Management 40(5), 735–750 (2004)
Article Google Scholar
Ponte, J., Croft, W.: A language modeling approach to information retrieval. In: SIGIR 1998, pp. 275–281 (1998)
Google Scholar
Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Google Scholar
Rivadeneira, A.W., Gruen, D.M., Muller, M.J., Millen, D.R.: Getting our head in the clouds: toward evaluation studies of tagclouds. In: Proceedings CHI 2007, pp. 995–998. ACM, New York (2007)
Google Scholar
Ruthven, I.: Re-examining the potential effectiveness of interactive query expansion. In: SIGIR 2003, pp. 213–220. ACM, New York (2003)
Chapter Google Scholar
Spärck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28, 11–21 (1972)
Article Google Scholar
Strohman, T., Metzler, D., Turtle, H., Croft, W.B.: Indri: a language-model based search engine for complex queries. In: Proceedings of the International Conference on Intelligent Analysis (2005)
Google Scholar
Wordle (2009), http://wordle.net
Zhai, C., Lafferty, J.: Model-based feedback in the language modeling approach to information retrieval. In: Proceedings CIKM 2001, pp. 403–410. ACM, New York (2001)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Archives and Information Studies, University of Amsterdam, The Netherlands
Rianne Kaptein & Jaap Kamps
Database Group, University of Twente, Enschede, The Netherlands
Djoerd Hiemstra
ISLA, Informatics Institute, University of Amsterdam, The Netherlands
Jaap Kamps

Authors

Rianne Kaptein
View author publications
You can also search for this author in PubMed Google Scholar
Djoerd Hiemstra
View author publications
You can also search for this author in PubMed Google Scholar
Jaap Kamps
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Adaptive Information Cluster, Dublin City University, Dublin, 9, Ireland
Cathal Gurrin
The Open University, Walton Hall, MK7 6HF, Milton Keynes, UK
Yulan He
Microsoft Research Ltd, 7 JJ Thomson Avenue, CB3 0FB, Cambridge, UK
Gabriella Kazai
Department of Computer Science, University of Essex, Wivenhoe Park, CO4 3SQ, Colchester, UK
Udo Kruschwitz
The Open University, Walton Hall, Milton Keynes, UK
Suzanne Little
University of London, London, UK
Thomas Roelleke
Knowledge Media Institute, The Open University, MK7 6AA, Milton Keynes, UK
Stefan Rüger
Department of Computing Science, University of Glasgow, 17 Lilybank Gardens, G12 8QQ, Glasgow, UK
Keith van Rijsbergen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kaptein, R., Hiemstra, D., Kamps, J. (2010). How Different Are Language Models andWord Clouds?. In: Gurrin, C., et al. Advances in Information Retrieval. ECIR 2010. Lecture Notes in Computer Science, vol 5993. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12275-0_48

Download citation

DOI: https://doi.org/10.1007/978-3-642-12275-0_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12274-3
Online ISBN: 978-3-642-12275-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics