Corpus Clouds - Facilitating Text Analysis by Means of Visualizations

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6562)


Large text corpora are a main language resource for the human-driven analysis of linguistic phenomena. With the ever increasing amount of data, it is vital to find ways to help people understand the data, and visualization techniques provide one way to do that. Corpus Clouds is a program which provides visualizations of different types of frequency information dynamically derived from a corpus via a standard query system, integrated with a standard KWIC display. We apply established principles from information visualization to provide dynamic, interactive representations of the query results. The selected design principles and alternatives to the implementation will be discussed and a preview on what other types of information connected to corpora can be visualized in similar ways are provided. Corpus Clouds can thus be seen as answer to the call by Collins et al. [1] to design in a principled way new visualization tools for linguistic data.


corpus linguistics visualization 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Collins, C., Penn, G., Carpendale, S.: Interactive Visualization for Computational Linguistics. In: ACL 2008: HLT Tutorials (2008),
  2. 2.
    Card, S.K., Mackinlay, J., Shneiderman, B.: Readings in Information Visualization: Using Vision to Think. Academic Press, San Diego (1999)Google Scholar
  3. 3.
    Ware, C.: Information Visualization, 2nd edn. Perception for Design. Elsevier, Inc., San Francisco (2004)Google Scholar
  4. 4.
    Collins, C.: A Critical Review of Information Visualizations for Natural Language. PhD qualifying exam paper, University of Toronto (2005),
  5. 5.
    Wattenberg, M., Viégas, F.B.: The Word Tree, an Interactive Visual Concordance. IEEE Trans. on Visualization and Computer Graphics 14(6), 1221–1228 (2008)CrossRefGoogle Scholar
  6. 6.
    Hearst, M.A.: Tilebars: Visualization of Term Distribution Information in Full Text Information Access. In: CHI 1995, Denver, Colorado, pp. 56–66 (1995)Google Scholar
  7. 7.
    Wattenberg, M.: Arc Diagrams: Visualizing Structure in Strings. In: IEEE Symposium on Information Visualization, pp. 110–116. IEEE Computer Society Press, Washington (2002)Google Scholar
  8. 8.
    Widdows, D., Cederberg, S., Dorow, B.: Visualisation Techniques for Analysing Meaning. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2002. LNCS (LNAI), vol. 2448, pp. 107–114. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  9. 9.
    DeCamp, P., Frid-Jimenez, A., Guiness, J., Roy, D.: Gist Icons: Seeing Meaning in Large Bodies of Literature. In: IEEE Symposium on Information Visualization. IEEE Computer Society Press, Washington (2005)Google Scholar
  10. 10.
    Collins, C.: Docuburst: Radial Space-filling Visualization of Document Content. Technical Report KMDI-TR-2007-1, Knowledge Media Design Institute, University of Toronto (2007)Google Scholar
  11. 11.
    Rohrer, R.M., Sibert, J.L., Ebert, D.S.: The Shape of Shakespeare: Visualizing Text Using Implicit Surfaces. In: IEEE Symposium on Information Visualization, pp. 121–129. IEEE Computer Society Press, Washington (1998)Google Scholar
  12. 12.
  13. 13.
  14. 14.
    Shneiderman, B.: The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations. In: IEEE Symposium on Visual Languages, pp. 336–343. IEEE Computer Society Press, Washington (1996)Google Scholar
  15. 15.
    Tufte, E.: Beautiful Evidence. Graphics Press, Cheshire (2006)Google Scholar
  16. 16.
    Kennedy, G.: An Introduction to Corpus Linguistics. Longman, London (1998)Google Scholar
  17. 17.
    Christ, O.: A Modular and Flexible Architecture for an Integrated Corpus Query System. In: 3rd Conference on Computational Lexicography and Text Research, Budapest, pp. 23–32 (1994)Google Scholar
  18. 18.
    Scott, M.: Developing WordSmith. In: Scott, M., Pérez-Paredes, P., Sánchez-Hernández, P. (eds.) Software-aided Analysis of Language, special issue of International Journal of English Studies, vol. 8(1), pp. 153–172 (2008)Google Scholar
  19. 19.
    Kilgarriff, A., Rychly, P., Smrz, P., Tugwell, D.: The Sketch Engine. In: EURALEX 2004, Lorient, pp. 105–116 (2004)Google Scholar
  20. 20.
    Sokirko, A.: DDC – A Search Engine for Linguistically Annotated Corpora. In: Dialogue (2003)Google Scholar
  21. 21.
    Lemnitzer, L., Zinsmeister, H.: Korpuslinguistik. Eine Einführung. Gunter Narr, Tübingen (2006)Google Scholar
  22. 22.
    Müller, B.: Fast Faust (2000),
  23. 23.
    Zipf, G.K.: Human Behavior and the Principle of Least-effort. Addison-Wesley, Cambridge (1949)Google Scholar
  24. 24.
    Hearst, M.A., Rosner, D.: Tag Clouds: Data Analysis Tool or Social Signaller? In: 41st Annual Hawaii international Conference on System Sciences, p. 160. IEEE Computer Society, Washington (2008)Google Scholar
  25. 25.
    Hassan-Montero, Y., Herrero-Solana, V.: Improving Tag-clouds as Visual Information Retrieval Interfaces. In: InSciT 2006, Mérida (2006)Google Scholar
  26. 26.
    Kaser, O., Lamire, D.: Tag-Cloud Drawing: Algorithms for Cloud Visualization. In: WWW 2007 Workshop on Tagging and Metadata for Social Information Organization, Banff, Alberta (2007)Google Scholar
  27. 27.

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  1. 1.European Academy Bozen/BolzanoBolzanoItaly

Personalised recommendations