Abstract
Large text corpora are a main language resource for the human-driven analysis of linguistic phenomena. With the ever increasing amount of data, it is vital to find ways to help people understand the data, and visualization techniques provide one way to do that. Corpus Clouds is a program which provides visualizations of different types of frequency information dynamically derived from a corpus via a standard query system, integrated with a standard KWIC display. We apply established principles from information visualization to provide dynamic, interactive representations of the query results. The selected design principles and alternatives to the implementation will be discussed and a preview on what other types of information connected to corpora can be visualized in similar ways are provided. Corpus Clouds can thus be seen as answer to the call by Collins et al. [1] to design in a principled way new visualization tools for linguistic data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Collins, C., Penn, G., Carpendale, S.: Interactive Visualization for Computational Linguistics. In: ACL 2008: HLT Tutorials (2008), http://www.cs.utoronto.ca/~ccollins/acl2008-vis.pdf
Card, S.K., Mackinlay, J., Shneiderman, B.: Readings in Information Visualization: Using Vision to Think. Academic Press, San Diego (1999)
Ware, C.: Information Visualization, 2nd edn. Perception for Design. Elsevier, Inc., San Francisco (2004)
Collins, C.: A Critical Review of Information Visualizations for Natural Language. PhD qualifying exam paper, University of Toronto (2005), http://www.cs.utoronto.ca/~ccollins/publications/docs/depthPaper.pdf
Wattenberg, M., Viégas, F.B.: The Word Tree, an Interactive Visual Concordance. IEEE Trans. on Visualization and Computer Graphics 14(6), 1221–1228 (2008)
Hearst, M.A.: Tilebars: Visualization of Term Distribution Information in Full Text Information Access. In: CHI 1995, Denver, Colorado, pp. 56–66 (1995)
Wattenberg, M.: Arc Diagrams: Visualizing Structure in Strings. In: IEEE Symposium on Information Visualization, pp. 110–116. IEEE Computer Society Press, Washington (2002)
Widdows, D., Cederberg, S., Dorow, B.: Visualisation Techniques for Analysing Meaning. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2002. LNCS (LNAI), vol. 2448, pp. 107–114. Springer, Heidelberg (2002)
DeCamp, P., Frid-Jimenez, A., Guiness, J., Roy, D.: Gist Icons: Seeing Meaning in Large Bodies of Literature. In: IEEE Symposium on Information Visualization. IEEE Computer Society Press, Washington (2005)
Collins, C.: Docuburst: Radial Space-filling Visualization of Document Content. Technical Report KMDI-TR-2007-1, Knowledge Media Design Institute, University of Toronto (2007)
Rohrer, R.M., Sibert, J.L., Ebert, D.S.: The Shape of Shakespeare: Visualizing Text Using Implicit Surfaces. In: IEEE Symposium on Information Visualization, pp. 121–129. IEEE Computer Society Press, Washington (1998)
TAPoR, http://portal.tapor.ca/
Shneiderman, B.: The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations. In: IEEE Symposium on Visual Languages, pp. 336–343. IEEE Computer Society Press, Washington (1996)
Tufte, E.: Beautiful Evidence. Graphics Press, Cheshire (2006)
Kennedy, G.: An Introduction to Corpus Linguistics. Longman, London (1998)
Christ, O.: A Modular and Flexible Architecture for an Integrated Corpus Query System. In: 3rd Conference on Computational Lexicography and Text Research, Budapest, pp. 23–32 (1994)
Scott, M.: Developing WordSmith. In: Scott, M., Pérez-Paredes, P., Sánchez-Hernández, P. (eds.) Software-aided Analysis of Language, special issue of International Journal of English Studies, vol. 8(1), pp. 153–172 (2008)
Kilgarriff, A., Rychly, P., Smrz, P., Tugwell, D.: The Sketch Engine. In: EURALEX 2004, Lorient, pp. 105–116 (2004)
Sokirko, A.: DDC – A Search Engine for Linguistically Annotated Corpora. In: Dialogue (2003)
Lemnitzer, L., Zinsmeister, H.: Korpuslinguistik. Eine Einführung. Gunter Narr, Tübingen (2006)
Müller, B.: Fast Faust (2000), http://www.esono.com/boris/projects/faust/
Zipf, G.K.: Human Behavior and the Principle of Least-effort. Addison-Wesley, Cambridge (1949)
Hearst, M.A., Rosner, D.: Tag Clouds: Data Analysis Tool or Social Signaller? In: 41st Annual Hawaii international Conference on System Sciences, p. 160. IEEE Computer Society, Washington (2008)
Hassan-Montero, Y., Herrero-Solana, V.: Improving Tag-clouds as Visual Information Retrieval Interfaces. In: InSciT 2006, Mérida (2006)
Kaser, O., Lamire, D.: Tag-Cloud Drawing: Algorithms for Cloud Visualization. In: WWW 2007 Workshop on Tagging and Metadata for Social Information Organization, Banff, Alberta (2007)
Google Visualization API, http://code.google.com/apis/visualization/documentation/gallery.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Culy, C., Lyding, V. (2011). Corpus Clouds - Facilitating Text Analysis by Means of Visualizations. In: Vetulani, Z. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2009. Lecture Notes in Computer Science(), vol 6562. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20095-3_32
Download citation
DOI: https://doi.org/10.1007/978-3-642-20095-3_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20094-6
Online ISBN: 978-3-642-20095-3
eBook Packages: Computer ScienceComputer Science (R0)