Abstract
In this chapter, we make an inventory of the tools suitable for supporting exploratory data analysis. Our major point is that the primary tool for analysis is the human imaginative mind, and that all other tools are supplementary. Only the human mind actually does the analysis; the other tools supply it with the necessary material, appropriately prepared and presented. The most appropriate form for the presentation of such material is visual, since the mind, as most scientists tend to agree, operates predominantly with images.
The techniques and software tools usable in exploratory data analysis are currently very numerous, and new tools continue to appear. It would be completely unfeasible to survey all of them. Therefore, we have tried instead to set out the major tool categories and describe the key functions and properties of each category. The resulting classification looks as follows:
-
Visualisation. The primary function of this tool category is representation of data in a visual form, i.e. creating various pictures from data: graphs, plots, diagrams, maps, etc. For this purpose, elements of data are translated into graphical features, such as positions within a display, colours, sizes, or shapes. It is important, however, that these graphical features coalesce into a single image rather than being perceived separately.
We divide the visual expressive means into display dimensions and visual, or retinal, variables. Display dimensions provide a set of positions within a display at which graphical elements, or marks, can be placed. Retinal variables represent various properties of the marks: shape, size, colour, texture, orientation, etc. In addition to the visual dimensions of a display, such as width, height, or depth, we consider also the display time, which can be used, for example, in animated presentations.
-
Display manipulation. This class consists of interactive tools that support dynamic modification of the appearance of visual displays. The general purpose of such modification is to enhance the image produced: to make it clearer and easier to perceive, to accentuate the distinctive features of the data represented, to focus on a particular item or subset of interest, etc. The manipulation is done through modifying the formula or algorithm used for the translation of data elements into visual features (we call this formula or algorithm the “visual encoding function”).
-
Data manipulation, i.e. derivation of new references and characteristics from existing ones. There are two major purposes in doing this: to simplify the data and make it easier to analyse, and, conversely, to enrich the data and consider various aspects of it. Thus, data aggregation reduces the amount of data and hence simplifies the analysis. Data interpolation, in contrast, produces additional data.
-
Querying, i.e. the automated search for answers to user-specified questions. Most typically, this is to search for references with specified characteristics or to search for the characteristics of specified references. Dynamic query tools, which allow the user to easily modify query conditions and quickly provide the required answer, are especially important for EDA.
-
Computation. In this category, we briefly consider the computational methods of statistics and data mining. Unlike the computations involved in data manipulation, which prepare data for further analysis, for example by transforming the data into a more suitable form, the function of computational tools is a kind of data distillation, or extraction of the essential features of data. Some examples of the outputs produced by computational tools are statistical characteristics of a dataset as a whole, indicators of relatedness between attributes, and models that predict some characteristics on the basis of other characteristics, in particular, future developments on the basis of the current state and of the history.
In exploratory data analysis, it is usually not enough to use a single tool. Various tools need to be combined. We consider two basic modes of tool combination, sequential and concurrent, and discuss the various mechanisms used for tool combination. Visualisation is an essential component of any tool ensemble. Initial data visualisation is used in order to understand what tools should be used for further work. Results produced by any non-visual tool need to be visualised so that the analyst can see and interpret them
Throughout this chapter, we provide many examples of various tools. Even when discussing non-visual tools such as data manipulation or computational methods, we use visualisation intensively to illustrate the examples. Readers can easily note that we have taken every opportunity to stress the great role of visualisation in exploratory data analysis. At the beginning of the chapter, we make an attempt to substantiate the importance of visualisation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ahlberg, C., Williamson, C., Shneiderman, B.: Dynamic queries for information exploration: an implementation and evaluation. In: Proceedings of ACM CHI’92 (ACM Press, New York 1992) pp. 619–626
Allen, J.F.: Maintaining knowledge about temporal intervals. Communications of the ACM 26(11), 123–154 (1983)
Andrienko, G., Andrienko, N.: Interactive maps for visual data exploration. International Journal of Geographical Information Science 13(4), 355–374 (1999)
Andrienko, G., Andrienko, N.: Constructing parallel coordinates plot for problem solving. In: 1st International Symposium on Smart Graphics, ed. by Butz, A., Krüger, A., Oliver, P., Zhou, M., Hawthorne NY, March 2001 (ACM Press, New York 2001) pp. 9–14
Andrienko, N., Andrienko, G.: Cumulative curves for exploration of demographic data: a case study of northwest england. Computational Statistics 19(1), 9–28 (2004)
Andrienko, G., Andrienko, N., Savinov, A.: Choropleth maps: classification revisited. In: Proceedings of ICA 2001, Beijing, vol. 2, pp. 1209–1219 (2001)
Arnheim, R.: Visual Thinking (University of California Press, Berkeley 1969, renewed 1997)
Bertin, J.: Semiology of Graphics. Diagrams, Networks, Maps (University of Wisconsin Press, Madison 1983). Translated from Bertin, J.: Sémiologie graphique (Gauthier-Villars, Paris 1967)
Blok, C., Koebben, B., Cheng, T., Kuterema, A.A.: Visualization of relationships between spatial patterns in time by cartographic animation. Cartography and Geographic Information Science 26(2), 139–151 (1999)
Boukhelifa, N., Rodgers, P.J.: A model and software system for coordinated and multiple views in exploratory visualization, Information Visualization, 2(4), 258–269 (2003)
Brewer, C.A.: Color use guidelines for mapping and visualization. In: Visualization in Modern Cartography, ed. by MacEachren, A.M., Fraser Taylor, D.R. (Elsevier, New York 1994) pp 123–147
Buja, A., McDonald, J.A., Michalak, J., Stuetzle, W.: Interactive data visualization using focusing and linking. In: Proceedings of IEEE Visualization’ 91 (IEEE Computer Society Press, Washington 1991) pp. 156–163
Burt, J.E., Barber, G.M.: Elementary Statistics for Geographers, 2nd edn (Guilford, New York 1996)
Carr, D.B., Olsen, A.R., White, D.: Hexagon mosaic maps for display of univariate and bivariate geographical data. Cartography and Geographic Information Systems 19(4), 228–236 (1992)
Carr, D.B., Wallin, J.F., Carr, D.A.: Two new templates for epidemiology applications: linked micromap plots and conditioned choropleth maps. Statistics in Medicine 19, 2521–2538 (2000)
Carr, D.B., Zhang, Y., Li, Y.: Dynamically conditioned choropleth maps: shareware for hypothesis generation and education. Statistical Computing & Statistical Graphics Newsletter 13(2), 2–7 (2002)
Casner, S.M.: A task-analytic approach to the automated design of graphic presentations. ACM Transactions on Graphics 10, 111–151 (1991)
Catarci, T., Costabile, M.F., Levialdi, S., Batini, C.: Visual query systems for databases: a survey. Journal of Visual Languages and Computing 8(2), 215–260 (1997)
Chen, H.: Compound brushing. In: IEEE Symposium on Information Visualization, Seattle, October 2003, ed. by Munzner, T., North, S. (IEEE Computer Society Press, Washington 2003) pp. 181–188
Cleveland W.S., McGill, R.: Graphical perception: theory, experimentation, and application to the development of graphical methods. Journal of the American Statistical Association 79(387), 531–554 (1984)
Cleveland W.S., McGill, R.: An experiment in graphical perception. International Journal of Man-Machine Studies 25(5), 491–500 (1986)
Roberts, J.C. (ed.): Proceedings of the First International Conference on Coordinated and Multiple Views in Exploratory Visualization (CMV’03), July 2003, London (IEEE Computer Society, Los Alamitos 2003)
Roberts, J.C. (ed.): Proceedings of the Second International Conference on Coordinated and Multiple Views in Exploratory Visualization (CMV’04), July 2004, London (IEEE Computer Society, Los Alamitos 2004)
Chrisman, N.: Exploring Geographic Information Systems (Wiley, New York 1997)
Cressie, N.A.C.: Statistics for Spatial Data (Wiley, New York 1991)
Dallal, G.E.: The Little Handbook of Statistical Practice. http://www.StatisticalPractice.com. Accessed 28 Mar 2005
Dodge, M.: NewsMaps: Topographic Mapping of information, (2000), http://mappa.mundi.net/maps/maps_015/. Accessed 28 Mar 2005
Dorling, D.: Visualising people in time and space. Environment and Planning B: Planning and Design 19, 613–647 (1992)
Edsall, R., Peuquet, D.: A graphical user interface for the integration of time into GIS. In: Proceedings of the 1997 American Congress of Surveying and Mapping Annual Convention and Exhibition, Seattle (1997) pp. 182–189
Egbert, S.L., Slocum, T.A.: EXPLOREMAP: an exploration system for choropleth maps. Annals of the Association of American Geographers 82, 275–288 (1992)
Fayyad, U., Grinstein, G.G., Wierse, A. (eds): Information Visualisation in Data Mining and Knowledge Discovery (Morgan Kaufmann, San Francisco 2002)
Fotheringham S., Rogerson P. (eds): Spatial Analysis and GIS (Taylor & Francis, London 1994)
Friendly, M.: Mosaic displays for multi-way contingency tables. Journal of the American Statistical Association 89, 190–200 (1994)
Friendly, M.: Corrgrams: exploratory displays for correlation matrices. American Statistician 56(4), 316–325 (2002)
Friendly, M., Kwan, E.: Effect ordering for data displays, Computational Statistics & Data Analysis 43, 509–539 (2003)
Furnas, G.W.: Generalized fisheye views. In: Proceedings of CHI’86 (ACM, New York 1986) pp. 16–23
Green, M.: Toward a perceptual science of multidimensional data Visualisation: Bertin and Beyond (1998), http://www.ergogero.com/dataviz/dvis0.html. Accessed 28 Mar 2005
Harrower, M., Griffin, A.L., MacEachren, A.M.: Temporal focusing and temporal brushing: assessing their impact in geographic visualization. In: Proceedings of the 19th International Cartographic Conference, Vol. 1 (1999) pp. 729–738
Hernandez, V., Göring, W., Voß, A., Hopmann, C.: Sustainable decision support by the use of multi-level and multi-criteria spatial analysis on the Nicaragua development gateway. In: 8th International Conference on Global Spatial Data Infrastructure, GSDI-8, Cairo, April 2005
Hochheiser, H., Shneiderman, B.: Dynamic query tools for time series data sets: timebox widgets for interactive exploration. Information Visualization 3(1), 1–18 (2004)
Roberts, J.C. (ed.): Special Issue on Coordinated and Multiple Views in Exploratory Visualization. Information Visualization 2(4), (2003)
Jenks, G.F.: Optimal data classification for choropleth maps, Occasional Paper No. 2 (Department of Geography, University of Kansas 1977)
Keim D., Kriegel, H.-P.: VisDB: database exploration using multidimensional visualization. IEEE Computer Graphics and Applications 14(5), 40–49 (1994)
Keogh, E.J., Kasetty, S.: On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Mining and Knowledge Discovery 7(4), 349–371 (2003)
Klösgen, W., Żytkow, J.M. (eds.): Handbook of Data Mining and Knowledge Discovery, (Oxford University Press, New York 2002)
Kosslyn, S.M.: Elements of Graph Design (Freeman, New York 1994)
Leung, Y.K., Apperley, M.D.: A review and taxonomy of distortion-oriented presentation techniques. ACM Transactions on Computer-Human Interaction 1(2), 126–160 (1994)
Li, Q., North, C.: Empirical comparison of dynamic query sliders and brushing histograms. In: Proceeding of IEEE Information Visualization 2003, Seattle (2003)
Lyutyy, A.A.: The Language of Map: Essence, System, Functions (Institute of Geography of the Russian Academy of Sciences, Moscow 1986) (in Russian)
MacEachren, A.M.: Some Truth with Maps: a Primer on Symbolization and Design (Association of American Geographers, Washington, DC 1994)
MacEachren, A.M.: How Maps Work: Representation, Visualization, and Design (Guilford, New York 1995)
Mackinlay, J.: Automating the design of graphical presentation of relational information. ACM Transactions on Graphics 5(2), 110–141 (1986)
Miller, H.J., Han, J.: Geographic data mining and knowledge discovery: an overview. In: Geographic Data Mining and Knowledge Discovery, ed. by Miller, H.J., Han, J. (Taylor & Francis, London 2001) pp. 3–32
Monmonier, M.: Strategies for the visualization of geographic time-series data. Cartographica 27(1), 30–45 (1990)
Newton, C.M.: Graphics: from alpha to omega in data analysis. In: Graphical Representation of Multivariate Data, ed. by Wang, P.C.C. (Academic Press, New York 1978) pp. 59–92
NIST/SEMATECH e-Handbook of Statistical Methods. Chapter 1: Exploratory Data Analysis, http://www.itl.nist.gov/div898/handbook/. Accessed 29 Mar 2005
Norman, K.L., Zhao, H., Shneiderman, B., Golub, E.: Dynamic query choropleth maps for information seeking and decision making. In: Proceedings of Human-Computer Interaction International 2003. Vol. 2: Theory and Practice (Lawrence Erlbaum Associates, 2003) pp. 1263–1267
North, C., Shneiderman, B.: A Taxonomy of Multiple-Window Coordinations, Technical Report CS-TR-3854 (University of Maryland Computer Science Department, College Park 1997)
North, C., Shneiderman, B.: Snap-Together Visualization: Coordinating Multiple Views to Explore Information, Technical Report CS-TR-4020 (University of Maryland Computer Science Department, College Park 1999)
North, C., Conklin, N., Indukuri, K., Saini, V.: Visualization schemas and a web-based architecture for custom multiple-view visualization of multiple-table databases. Information Visualization 1(3–4), 211–228 (2002)
Peuquet, D.J.: Representations of Space and Time (Guilford, New York 2002)
Rana, S., Dykes, J.: A framework for augmenting the visualization of dynamic raster surfaces. Information Visualization 2, 126–139 (2003)
Random House Webster’s Unabridged Electronic Dictionary (Random House, Broadway, NY 1996)
Rhodes, P.J.: Discovering New Relationships: A brief overview of data mining and knowledge discovery. In: Information Visualisation in Data Mining and Knowledge Discovery, ed. by Fayyad, U., Grinstein, G.G., Wierse, A. (Morgan Kaufmann, San Francisco 2002)
Roberts, J.C.: On encouraging multiple views for visualisation. In: Information Visualisation IV’98, ed by Banissi, E., Khosrowshahi, F., Safraz, M., July 1998 (IEEE Computer Society Press, Washington 1998) pp. 8–14
Roth, S.M., Mattis, J.: Data characterization for intelligent graphics presentation. In: Proc. SIGCHI’90: Human Factors in Computing Systems, Seattle, 1990 (ACM Press, New York 1990) pp. 193–200
Rousseeuw, P.J., Ruts, I., Tukey, J.W.: The Bagplot: a bivariate boxplot. The American Statistician 53(4), 382–387 (1999)
Sadahiro, Y.: A graphical method for exploring spatiotemporal point distributions. Cartography and Geographic Information Science 29(2), 67–84 (2002)
Salichtchev, K.A.: Cartography: a Textbook for Geographical Specialities of Universities, 3rd edn (Vysshaya Shkola, Moscow 1982) (in Russian)
Senay, H., Ignatius, E.: A knowledge-based system for visualization design. IEEE Computer Graphics and Applications 14(6), 36–47 (1994)
Shekhar, S., Chawla, A.: Spatial Databases: a Tour (Pearson Education, Upper Saddle River 2003)
Shneiderman, B.: Tree visualization with treemaps: a 2-D space-filling approach. ACM Transactions on Graphics 11(1), 92–99 (1992)
Slocum, T.A.: Thematic Cartography and Visualization (Prentice Hall, Upper Saddle River 1999)
Spence, R.: Information Visualisation (Addison-Wesley, Harlow 2001)
Spence, R., Tweedy, L.: The Attribute Explorer: information synthesis via exploration. Interacting with Computers 11, 137–146 (1998)
Spoerri, A.: InfoCrystal: a visual tool for information retrieval. In: Readings in Information Visualization: Using Vision to Think, ed. by Card, S.K., Mackinlay, J.D., Shneiderman, B. (Morgan Kaufmann, San Francisco 1999) pp 140–147
StatSoft, Inc.: Electronic Statistics Textbook (StatSoft, Tulsa 2004), http://www.statsoft.com/textbook/stathome.html. Accessed 28 Mar 2005
Stolte, C., Tang, D., Hanrahan, P.: Multiscale visualization using data cubes. In: Proceedings of the IEEE Symposium on Information Visualization 2002, InfoVis’02, Boston, USA, October 2002, ed. by Wong, P.C., Andrews, K. (IEEE Computer Society, Piscataway 2002) pp. 7–14
Tufte, E.R.: The Visual Display of Quantitative Information (Graphics Press, Cheshire CT, 1983)
Tufte, E.R.: Envisioning Information (Graphics Press, Cheshire, CT 1990)
Tukey, J.W.: Exploratory Data Analysis (Addison-Wesley, Reading, MA 1977)
Unwin, A.R., Hofmann, H.: New interactive graphics tools for exploratory analysis of spatial data. In: Innovations in GIS, Vol. 5, ed. by Carver, S. (Taylor & Francis, London 1998) pp. 46–55
Wattenberg, M. Sketching a graph to query a time-series database. In: Extended Abstracts of CHI’ 01, Seattle, March–April 2001, (ACM Press, New York 2001) pp. 379–380
Wilkinson, L.: The Grammar of Graphics (Springer-Verlag, New York 1999)
Wise, J., Thomas, J., Pennock, K., Lantrip, D., Pottier, M., Schur, A., Crow, V.: Visualizing the non-visual: spatial analysis and interaction with information from text documents. In: Proceedings of IEEE 1995 Symposium on Information Visualization, Atlanta (1995) pp. 51–58
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations (Morgan Kaufmann, San Francisco 1999)
Zenkin, A.A.: Waring’s problem from the standpoint of the cognitive interactive computer graphics. Mathematical and Computer Modelling 13(11), 9–37 (1990)
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
(2006). Tools. In: Exploratory Analysis of Spatial and Temporal Data. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-31190-4_4
Download citation
DOI: https://doi.org/10.1007/3-540-31190-4_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25994-7
Online ISBN: 978-3-540-31190-4
eBook Packages: Computer ScienceComputer Science (R0)