Skip to main content

How Different Are Language Models andWord Clouds?

  • Conference paper
Advances in Information Retrieval (ECIR 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5993))

Included in the following conference series:

Abstract

Word clouds are a summarised representation of a document’s text, similar to tag clouds which summarise the tags assigned to documents. Word clouds are similar to language models in the sense that they represent a document by its word distribution. In this paper we investigate the differences between word cloud and language modelling approaches, and specifically whether effective language modelling techniques also improve word clouds. We evaluate the quality of the language model using a system evaluation test bed, and evaluate the quality of the resulting word cloud with a user study. Our experiments show that different language modelling techniques can be applied to improve a standard word cloud that uses a TF weighting scheme in combination with stopword removal. Including bigrams in the word clouds and a parsimonious term weighting scheme are the most effective in both the system evaluation and the user study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bateman, S., Gutwin, C., Nacenta, M.: Seeing things in the clouds: the effect of visual features on tag cloud selections. In: Proceedings HT 2008, pp. 193–202. ACM, New York (2008)

    Chapter  Google Scholar 

  2. Brooks, C.H., Montanez, N.: Improved annotation of the blogosphere via autotagging and hierarchical clustering. In: Proceedings WWW 2006, pp. 625–632. ACM, New York (2006)

    Chapter  Google Scholar 

  3. Buckley, C., Robertson, S.: Relevance feedback track overview: TREC 2008. In: The Seventeenth Text REtrieval Conference (TREC 2008) Notebook (2008)

    Google Scholar 

  4. Coupland, D.: Microserfs. HarperCollins, Toronto (1995)

    Google Scholar 

  5. Dredze, M., Wallach, H.M., Puller, D., Pereira, F.: Generating summary keywords for emails using topics. In: Proceedings of the 2008 International Conference on Intelligent User Interfaces (2008)

    Google Scholar 

  6. Halvey, M.J., Keane, M.T.: An assessment of tag presentation techniques. In: Proceedings WWW 2007, pp. 1313–1314. ACM, New York (2007)

    Chapter  Google Scholar 

  7. Harman, D.: How effective is suffixing? Journal of the American Society for Information Science 42, 7–15 (1991)

    Article  MathSciNet  Google Scholar 

  8. Hiemstra, D., Robertson, S., Zaragoza, H.: Parsimonious language models for information retrieval. In: Proceedings SIGIR 2004, pp. 178–185. ACM Press, New York (2004)

    Chapter  Google Scholar 

  9. Kuo, B.Y.-L., Hentrich, T., Good, B.M., Wilkinson, M.D.: Tag clouds for summarizing web search results. In: Proceedings WWW 2007, pp. 1203–1204. ACM, New York (2007)

    Chapter  Google Scholar 

  10. Lambiotte, R., Ausloos, M.: Collaborative tagging as a tripartite network. In: Computational Science – ICCS 2006, pp. 1114–1117 (2006)

    Google Scholar 

  11. LibraryThing (2009), http://www.librarything.com/

  12. Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    MATH  Google Scholar 

  13. ManyEyes (2009), http://manyeyes.alphaworks.ibm.com/manyeyes/page/Tag_Cloud.html

  14. Maron, M.E., Kuhns, J.L.: On relevance, probabilistic indexing and information retrieval. Journal of the ACM 7(3), 216–244 (1960)

    Article  Google Scholar 

  15. Metzler, D., Croft, W.B.: Combining the language model and inference network approaches to retrieval. Information Processing & Management 40(5), 735–750 (2004)

    Article  Google Scholar 

  16. Ponte, J., Croft, W.: A language modeling approach to information retrieval. In: SIGIR 1998, pp. 275–281 (1998)

    Google Scholar 

  17. Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Google Scholar 

  18. Rivadeneira, A.W., Gruen, D.M., Muller, M.J., Millen, D.R.: Getting our head in the clouds: toward evaluation studies of tagclouds. In: Proceedings CHI 2007, pp. 995–998. ACM, New York (2007)

    Google Scholar 

  19. Ruthven, I.: Re-examining the potential effectiveness of interactive query expansion. In: SIGIR 2003, pp. 213–220. ACM, New York (2003)

    Chapter  Google Scholar 

  20. Spärck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28, 11–21 (1972)

    Article  Google Scholar 

  21. Strohman, T., Metzler, D., Turtle, H., Croft, W.B.: Indri: a language-model based search engine for complex queries. In: Proceedings of the International Conference on Intelligent Analysis (2005)

    Google Scholar 

  22. Wordle (2009), http://wordle.net

  23. Zhai, C., Lafferty, J.: Model-based feedback in the language modeling approach to information retrieval. In: Proceedings CIKM 2001, pp. 403–410. ACM, New York (2001)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kaptein, R., Hiemstra, D., Kamps, J. (2010). How Different Are Language Models andWord Clouds?. In: Gurrin, C., et al. Advances in Information Retrieval. ECIR 2010. Lecture Notes in Computer Science, vol 5993. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12275-0_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12275-0_48

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12274-3

  • Online ISBN: 978-3-642-12275-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics