Corpus Clouds - Facilitating Text Analysis by Means of Visualizations
- 1 Citations
- 952 Downloads
Abstract
Large text corpora are a main language resource for the human-driven analysis of linguistic phenomena. With the ever increasing amount of data, it is vital to find ways to help people understand the data, and visualization techniques provide one way to do that. Corpus Clouds is a program which provides visualizations of different types of frequency information dynamically derived from a corpus via a standard query system, integrated with a standard KWIC display. We apply established principles from information visualization to provide dynamic, interactive representations of the query results. The selected design principles and alternatives to the implementation will be discussed and a preview on what other types of information connected to corpora can be visualized in similar ways are provided. Corpus Clouds can thus be seen as answer to the call by Collins et al. [1] to design in a principled way new visualization tools for linguistic data.
Keywords
corpus linguistics visualizationPreview
Unable to display preview. Download preview PDF.
References
- 1.Collins, C., Penn, G., Carpendale, S.: Interactive Visualization for Computational Linguistics. In: ACL 2008: HLT Tutorials (2008), http://www.cs.utoronto.ca/~ccollins/acl2008-vis.pdf
- 2.Card, S.K., Mackinlay, J., Shneiderman, B.: Readings in Information Visualization: Using Vision to Think. Academic Press, San Diego (1999)Google Scholar
- 3.Ware, C.: Information Visualization, 2nd edn. Perception for Design. Elsevier, Inc., San Francisco (2004)Google Scholar
- 4.Collins, C.: A Critical Review of Information Visualizations for Natural Language. PhD qualifying exam paper, University of Toronto (2005), http://www.cs.utoronto.ca/~ccollins/publications/docs/depthPaper.pdf
- 5.Wattenberg, M., Viégas, F.B.: The Word Tree, an Interactive Visual Concordance. IEEE Trans. on Visualization and Computer Graphics 14(6), 1221–1228 (2008)CrossRefGoogle Scholar
- 6.Hearst, M.A.: Tilebars: Visualization of Term Distribution Information in Full Text Information Access. In: CHI 1995, Denver, Colorado, pp. 56–66 (1995)Google Scholar
- 7.Wattenberg, M.: Arc Diagrams: Visualizing Structure in Strings. In: IEEE Symposium on Information Visualization, pp. 110–116. IEEE Computer Society Press, Washington (2002)Google Scholar
- 8.Widdows, D., Cederberg, S., Dorow, B.: Visualisation Techniques for Analysing Meaning. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2002. LNCS (LNAI), vol. 2448, pp. 107–114. Springer, Heidelberg (2002)CrossRefGoogle Scholar
- 9.DeCamp, P., Frid-Jimenez, A., Guiness, J., Roy, D.: Gist Icons: Seeing Meaning in Large Bodies of Literature. In: IEEE Symposium on Information Visualization. IEEE Computer Society Press, Washington (2005)Google Scholar
- 10.Collins, C.: Docuburst: Radial Space-filling Visualization of Document Content. Technical Report KMDI-TR-2007-1, Knowledge Media Design Institute, University of Toronto (2007)Google Scholar
- 11.Rohrer, R.M., Sibert, J.L., Ebert, D.S.: The Shape of Shakespeare: Visualizing Text Using Implicit Surfaces. In: IEEE Symposium on Information Visualization, pp. 121–129. IEEE Computer Society Press, Washington (1998)Google Scholar
- 12.TAPoR, http://portal.tapor.ca/
- 13.
- 14.Shneiderman, B.: The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations. In: IEEE Symposium on Visual Languages, pp. 336–343. IEEE Computer Society Press, Washington (1996)Google Scholar
- 15.Tufte, E.: Beautiful Evidence. Graphics Press, Cheshire (2006)Google Scholar
- 16.Kennedy, G.: An Introduction to Corpus Linguistics. Longman, London (1998)Google Scholar
- 17.Christ, O.: A Modular and Flexible Architecture for an Integrated Corpus Query System. In: 3rd Conference on Computational Lexicography and Text Research, Budapest, pp. 23–32 (1994)Google Scholar
- 18.Scott, M.: Developing WordSmith. In: Scott, M., Pérez-Paredes, P., Sánchez-Hernández, P. (eds.) Software-aided Analysis of Language, special issue of International Journal of English Studies, vol. 8(1), pp. 153–172 (2008)Google Scholar
- 19.Kilgarriff, A., Rychly, P., Smrz, P., Tugwell, D.: The Sketch Engine. In: EURALEX 2004, Lorient, pp. 105–116 (2004)Google Scholar
- 20.Sokirko, A.: DDC – A Search Engine for Linguistically Annotated Corpora. In: Dialogue (2003)Google Scholar
- 21.Lemnitzer, L., Zinsmeister, H.: Korpuslinguistik. Eine Einführung. Gunter Narr, Tübingen (2006)Google Scholar
- 22.Müller, B.: Fast Faust (2000), http://www.esono.com/boris/projects/faust/
- 23.Zipf, G.K.: Human Behavior and the Principle of Least-effort. Addison-Wesley, Cambridge (1949)Google Scholar
- 24.Hearst, M.A., Rosner, D.: Tag Clouds: Data Analysis Tool or Social Signaller? In: 41st Annual Hawaii international Conference on System Sciences, p. 160. IEEE Computer Society, Washington (2008)Google Scholar
- 25.Hassan-Montero, Y., Herrero-Solana, V.: Improving Tag-clouds as Visual Information Retrieval Interfaces. In: InSciT 2006, Mérida (2006)Google Scholar
- 26.Kaser, O., Lamire, D.: Tag-Cloud Drawing: Algorithms for Cloud Visualization. In: WWW 2007 Workshop on Tagging and Metadata for Social Information Organization, Banff, Alberta (2007)Google Scholar
- 27.Google Visualization API, http://code.google.com/apis/visualization/documentation/gallery.html