Skip to main content
  • 3132 Accesses

Abstract

In this chapter, we make an inventory of the tools suitable for supporting exploratory data analysis. Our major point is that the primary tool for analysis is the human imaginative mind, and that all other tools are supplementary. Only the human mind actually does the analysis; the other tools supply it with the necessary material, appropriately prepared and presented. The most appropriate form for the presentation of such material is visual, since the mind, as most scientists tend to agree, operates predominantly with images.

The techniques and software tools usable in exploratory data analysis are currently very numerous, and new tools continue to appear. It would be completely unfeasible to survey all of them. Therefore, we have tried instead to set out the major tool categories and describe the key functions and properties of each category. The resulting classification looks as follows:

  • Visualisation. The primary function of this tool category is representation of data in a visual form, i.e. creating various pictures from data: graphs, plots, diagrams, maps, etc. For this purpose, elements of data are translated into graphical features, such as positions within a display, colours, sizes, or shapes. It is important, however, that these graphical features coalesce into a single image rather than being perceived separately.

We divide the visual expressive means into display dimensions and visual, or retinal, variables. Display dimensions provide a set of positions within a display at which graphical elements, or marks, can be placed. Retinal variables represent various properties of the marks: shape, size, colour, texture, orientation, etc. In addition to the visual dimensions of a display, such as width, height, or depth, we consider also the display time, which can be used, for example, in animated presentations.

  • Display manipulation. This class consists of interactive tools that support dynamic modification of the appearance of visual displays. The general purpose of such modification is to enhance the image produced: to make it clearer and easier to perceive, to accentuate the distinctive features of the data represented, to focus on a particular item or subset of interest, etc. The manipulation is done through modifying the formula or algorithm used for the translation of data elements into visual features (we call this formula or algorithm the “visual encoding function”).

  • Data manipulation, i.e. derivation of new references and characteristics from existing ones. There are two major purposes in doing this: to simplify the data and make it easier to analyse, and, conversely, to enrich the data and consider various aspects of it. Thus, data aggregation reduces the amount of data and hence simplifies the analysis. Data interpolation, in contrast, produces additional data.

  • Querying, i.e. the automated search for answers to user-specified questions. Most typically, this is to search for references with specified characteristics or to search for the characteristics of specified references. Dynamic query tools, which allow the user to easily modify query conditions and quickly provide the required answer, are especially important for EDA.

  • Computation. In this category, we briefly consider the computational methods of statistics and data mining. Unlike the computations involved in data manipulation, which prepare data for further analysis, for example by transforming the data into a more suitable form, the function of computational tools is a kind of data distillation, or extraction of the essential features of data. Some examples of the outputs produced by computational tools are statistical characteristics of a dataset as a whole, indicators of relatedness between attributes, and models that predict some characteristics on the basis of other characteristics, in particular, future developments on the basis of the current state and of the history.

In exploratory data analysis, it is usually not enough to use a single tool. Various tools need to be combined. We consider two basic modes of tool combination, sequential and concurrent, and discuss the various mechanisms used for tool combination. Visualisation is an essential component of any tool ensemble. Initial data visualisation is used in order to understand what tools should be used for further work. Results produced by any non-visual tool need to be visualised so that the analyst can see and interpret them

Throughout this chapter, we provide many examples of various tools. Even when discussing non-visual tools such as data manipulation or computational methods, we use visualisation intensively to illustrate the examples. Readers can easily note that we have taken every opportunity to stress the great role of visualisation in exploratory data analysis. At the beginning of the chapter, we make an attempt to substantiate the importance of visualisation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ahlberg, C., Williamson, C., Shneiderman, B.: Dynamic queries for information exploration: an implementation and evaluation. In: Proceedings of ACM CHI’92 (ACM Press, New York 1992) pp. 619–626

    Google Scholar 

  2. Allen, J.F.: Maintaining knowledge about temporal intervals. Communications of the ACM 26(11), 123–154 (1983)

    Article  Google Scholar 

  3. Andrienko, G., Andrienko, N.: Interactive maps for visual data exploration. International Journal of Geographical Information Science 13(4), 355–374 (1999)

    Google Scholar 

  4. Andrienko, G., Andrienko, N.: Constructing parallel coordinates plot for problem solving. In: 1st International Symposium on Smart Graphics, ed. by Butz, A., Krüger, A., Oliver, P., Zhou, M., Hawthorne NY, March 2001 (ACM Press, New York 2001) pp. 9–14

    Google Scholar 

  5. Andrienko, N., Andrienko, G.: Cumulative curves for exploration of demographic data: a case study of northwest england. Computational Statistics 19(1), 9–28 (2004)

    MathSciNet  Google Scholar 

  6. Andrienko, G., Andrienko, N., Savinov, A.: Choropleth maps: classification revisited. In: Proceedings of ICA 2001, Beijing, vol. 2, pp. 1209–1219 (2001)

    Google Scholar 

  7. Arnheim, R.: Visual Thinking (University of California Press, Berkeley 1969, renewed 1997)

    Google Scholar 

  8. Bertin, J.: Semiology of Graphics. Diagrams, Networks, Maps (University of Wisconsin Press, Madison 1983). Translated from Bertin, J.: Sémiologie graphique (Gauthier-Villars, Paris 1967)

    Google Scholar 

  9. Blok, C., Koebben, B., Cheng, T., Kuterema, A.A.: Visualization of relationships between spatial patterns in time by cartographic animation. Cartography and Geographic Information Science 26(2), 139–151 (1999)

    Google Scholar 

  10. Boukhelifa, N., Rodgers, P.J.: A model and software system for coordinated and multiple views in exploratory visualization, Information Visualization, 2(4), 258–269 (2003)

    Article  Google Scholar 

  11. Brewer, C.A.: Color use guidelines for mapping and visualization. In: Visualization in Modern Cartography, ed. by MacEachren, A.M., Fraser Taylor, D.R. (Elsevier, New York 1994) pp 123–147

    Google Scholar 

  12. Buja, A., McDonald, J.A., Michalak, J., Stuetzle, W.: Interactive data visualization using focusing and linking. In: Proceedings of IEEE Visualization’ 91 (IEEE Computer Society Press, Washington 1991) pp. 156–163

    Google Scholar 

  13. Burt, J.E., Barber, G.M.: Elementary Statistics for Geographers, 2nd edn (Guilford, New York 1996)

    Google Scholar 

  14. Carr, D.B., Olsen, A.R., White, D.: Hexagon mosaic maps for display of univariate and bivariate geographical data. Cartography and Geographic Information Systems 19(4), 228–236 (1992)

    Google Scholar 

  15. Carr, D.B., Wallin, J.F., Carr, D.A.: Two new templates for epidemiology applications: linked micromap plots and conditioned choropleth maps. Statistics in Medicine 19, 2521–2538 (2000)

    Article  Google Scholar 

  16. Carr, D.B., Zhang, Y., Li, Y.: Dynamically conditioned choropleth maps: shareware for hypothesis generation and education. Statistical Computing & Statistical Graphics Newsletter 13(2), 2–7 (2002)

    Google Scholar 

  17. Casner, S.M.: A task-analytic approach to the automated design of graphic presentations. ACM Transactions on Graphics 10, 111–151 (1991)

    Article  Google Scholar 

  18. Catarci, T., Costabile, M.F., Levialdi, S., Batini, C.: Visual query systems for databases: a survey. Journal of Visual Languages and Computing 8(2), 215–260 (1997)

    Google Scholar 

  19. Chen, H.: Compound brushing. In: IEEE Symposium on Information Visualization, Seattle, October 2003, ed. by Munzner, T., North, S. (IEEE Computer Society Press, Washington 2003) pp. 181–188

    Google Scholar 

  20. Cleveland W.S., McGill, R.: Graphical perception: theory, experimentation, and application to the development of graphical methods. Journal of the American Statistical Association 79(387), 531–554 (1984)

    MathSciNet  Google Scholar 

  21. Cleveland W.S., McGill, R.: An experiment in graphical perception. International Journal of Man-Machine Studies 25(5), 491–500 (1986)

    Google Scholar 

  22. Roberts, J.C. (ed.): Proceedings of the First International Conference on Coordinated and Multiple Views in Exploratory Visualization (CMV’03), July 2003, London (IEEE Computer Society, Los Alamitos 2003)

    Google Scholar 

  23. Roberts, J.C. (ed.): Proceedings of the Second International Conference on Coordinated and Multiple Views in Exploratory Visualization (CMV’04), July 2004, London (IEEE Computer Society, Los Alamitos 2004)

    Google Scholar 

  24. Chrisman, N.: Exploring Geographic Information Systems (Wiley, New York 1997)

    Google Scholar 

  25. Cressie, N.A.C.: Statistics for Spatial Data (Wiley, New York 1991)

    Google Scholar 

  26. Dallal, G.E.: The Little Handbook of Statistical Practice. http://www.StatisticalPractice.com. Accessed 28 Mar 2005

    Google Scholar 

  27. Dodge, M.: NewsMaps: Topographic Mapping of information, (2000), http://mappa.mundi.net/maps/maps_015/. Accessed 28 Mar 2005

    Google Scholar 

  28. Dorling, D.: Visualising people in time and space. Environment and Planning B: Planning and Design 19, 613–647 (1992)

    Google Scholar 

  29. Edsall, R., Peuquet, D.: A graphical user interface for the integration of time into GIS. In: Proceedings of the 1997 American Congress of Surveying and Mapping Annual Convention and Exhibition, Seattle (1997) pp. 182–189

    Google Scholar 

  30. Egbert, S.L., Slocum, T.A.: EXPLOREMAP: an exploration system for choropleth maps. Annals of the Association of American Geographers 82, 275–288 (1992)

    Article  Google Scholar 

  31. Fayyad, U., Grinstein, G.G., Wierse, A. (eds): Information Visualisation in Data Mining and Knowledge Discovery (Morgan Kaufmann, San Francisco 2002)

    Google Scholar 

  32. Fotheringham S., Rogerson P. (eds): Spatial Analysis and GIS (Taylor & Francis, London 1994)

    Google Scholar 

  33. Friendly, M.: Mosaic displays for multi-way contingency tables. Journal of the American Statistical Association 89, 190–200 (1994)

    Google Scholar 

  34. Friendly, M.: Corrgrams: exploratory displays for correlation matrices. American Statistician 56(4), 316–325 (2002)

    MathSciNet  Google Scholar 

  35. Friendly, M., Kwan, E.: Effect ordering for data displays, Computational Statistics & Data Analysis 43, 509–539 (2003)

    Article  MathSciNet  Google Scholar 

  36. Furnas, G.W.: Generalized fisheye views. In: Proceedings of CHI’86 (ACM, New York 1986) pp. 16–23

    Google Scholar 

  37. Green, M.: Toward a perceptual science of multidimensional data Visualisation: Bertin and Beyond (1998), http://www.ergogero.com/dataviz/dvis0.html. Accessed 28 Mar 2005

    Google Scholar 

  38. Harrower, M., Griffin, A.L., MacEachren, A.M.: Temporal focusing and temporal brushing: assessing their impact in geographic visualization. In: Proceedings of the 19th International Cartographic Conference, Vol. 1 (1999) pp. 729–738

    Google Scholar 

  39. Hernandez, V., Göring, W., Voß, A., Hopmann, C.: Sustainable decision support by the use of multi-level and multi-criteria spatial analysis on the Nicaragua development gateway. In: 8th International Conference on Global Spatial Data Infrastructure, GSDI-8, Cairo, April 2005

    Google Scholar 

  40. Hochheiser, H., Shneiderman, B.: Dynamic query tools for time series data sets: timebox widgets for interactive exploration. Information Visualization 3(1), 1–18 (2004)

    Article  Google Scholar 

  41. Roberts, J.C. (ed.): Special Issue on Coordinated and Multiple Views in Exploratory Visualization. Information Visualization 2(4), (2003)

    Google Scholar 

  42. Jenks, G.F.: Optimal data classification for choropleth maps, Occasional Paper No. 2 (Department of Geography, University of Kansas 1977)

    Google Scholar 

  43. Keim D., Kriegel, H.-P.: VisDB: database exploration using multidimensional visualization. IEEE Computer Graphics and Applications 14(5), 40–49 (1994)

    Article  Google Scholar 

  44. Keogh, E.J., Kasetty, S.: On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Mining and Knowledge Discovery 7(4), 349–371 (2003)

    Article  MathSciNet  Google Scholar 

  45. Klösgen, W., Żytkow, J.M. (eds.): Handbook of Data Mining and Knowledge Discovery, (Oxford University Press, New York 2002)

    Google Scholar 

  46. Kosslyn, S.M.: Elements of Graph Design (Freeman, New York 1994)

    Google Scholar 

  47. Leung, Y.K., Apperley, M.D.: A review and taxonomy of distortion-oriented presentation techniques. ACM Transactions on Computer-Human Interaction 1(2), 126–160 (1994)

    Article  Google Scholar 

  48. Li, Q., North, C.: Empirical comparison of dynamic query sliders and brushing histograms. In: Proceeding of IEEE Information Visualization 2003, Seattle (2003)

    Google Scholar 

  49. Lyutyy, A.A.: The Language of Map: Essence, System, Functions (Institute of Geography of the Russian Academy of Sciences, Moscow 1986) (in Russian)

    Google Scholar 

  50. MacEachren, A.M.: Some Truth with Maps: a Primer on Symbolization and Design (Association of American Geographers, Washington, DC 1994)

    Google Scholar 

  51. MacEachren, A.M.: How Maps Work: Representation, Visualization, and Design (Guilford, New York 1995)

    Google Scholar 

  52. Mackinlay, J.: Automating the design of graphical presentation of relational information. ACM Transactions on Graphics 5(2), 110–141 (1986)

    Article  Google Scholar 

  53. Miller, H.J., Han, J.: Geographic data mining and knowledge discovery: an overview. In: Geographic Data Mining and Knowledge Discovery, ed. by Miller, H.J., Han, J. (Taylor & Francis, London 2001) pp. 3–32

    Google Scholar 

  54. Monmonier, M.: Strategies for the visualization of geographic time-series data. Cartographica 27(1), 30–45 (1990)

    Google Scholar 

  55. Newton, C.M.: Graphics: from alpha to omega in data analysis. In: Graphical Representation of Multivariate Data, ed. by Wang, P.C.C. (Academic Press, New York 1978) pp. 59–92

    Google Scholar 

  56. NIST/SEMATECH e-Handbook of Statistical Methods. Chapter 1: Exploratory Data Analysis, http://www.itl.nist.gov/div898/handbook/. Accessed 29 Mar 2005

    Google Scholar 

  57. Norman, K.L., Zhao, H., Shneiderman, B., Golub, E.: Dynamic query choropleth maps for information seeking and decision making. In: Proceedings of Human-Computer Interaction International 2003. Vol. 2: Theory and Practice (Lawrence Erlbaum Associates, 2003) pp. 1263–1267

    Google Scholar 

  58. North, C., Shneiderman, B.: A Taxonomy of Multiple-Window Coordinations, Technical Report CS-TR-3854 (University of Maryland Computer Science Department, College Park 1997)

    Google Scholar 

  59. North, C., Shneiderman, B.: Snap-Together Visualization: Coordinating Multiple Views to Explore Information, Technical Report CS-TR-4020 (University of Maryland Computer Science Department, College Park 1999)

    Google Scholar 

  60. North, C., Conklin, N., Indukuri, K., Saini, V.: Visualization schemas and a web-based architecture for custom multiple-view visualization of multiple-table databases. Information Visualization 1(3–4), 211–228 (2002)

    Google Scholar 

  61. Peuquet, D.J.: Representations of Space and Time (Guilford, New York 2002)

    Google Scholar 

  62. Rana, S., Dykes, J.: A framework for augmenting the visualization of dynamic raster surfaces. Information Visualization 2, 126–139 (2003)

    Article  Google Scholar 

  63. Random House Webster’s Unabridged Electronic Dictionary (Random House, Broadway, NY 1996)

    Google Scholar 

  64. Rhodes, P.J.: Discovering New Relationships: A brief overview of data mining and knowledge discovery. In: Information Visualisation in Data Mining and Knowledge Discovery, ed. by Fayyad, U., Grinstein, G.G., Wierse, A. (Morgan Kaufmann, San Francisco 2002)

    Google Scholar 

  65. Roberts, J.C.: On encouraging multiple views for visualisation. In: Information Visualisation IV’98, ed by Banissi, E., Khosrowshahi, F., Safraz, M., July 1998 (IEEE Computer Society Press, Washington 1998) pp. 8–14

    Google Scholar 

  66. Roth, S.M., Mattis, J.: Data characterization for intelligent graphics presentation. In: Proc. SIGCHI’90: Human Factors in Computing Systems, Seattle, 1990 (ACM Press, New York 1990) pp. 193–200

    Google Scholar 

  67. Rousseeuw, P.J., Ruts, I., Tukey, J.W.: The Bagplot: a bivariate boxplot. The American Statistician 53(4), 382–387 (1999)

    Google Scholar 

  68. Sadahiro, Y.: A graphical method for exploring spatiotemporal point distributions. Cartography and Geographic Information Science 29(2), 67–84 (2002)

    Google Scholar 

  69. Salichtchev, K.A.: Cartography: a Textbook for Geographical Specialities of Universities, 3rd edn (Vysshaya Shkola, Moscow 1982) (in Russian)

    Google Scholar 

  70. Senay, H., Ignatius, E.: A knowledge-based system for visualization design. IEEE Computer Graphics and Applications 14(6), 36–47 (1994)

    Article  Google Scholar 

  71. Shekhar, S., Chawla, A.: Spatial Databases: a Tour (Pearson Education, Upper Saddle River 2003)

    Google Scholar 

  72. Shneiderman, B.: Tree visualization with treemaps: a 2-D space-filling approach. ACM Transactions on Graphics 11(1), 92–99 (1992)

    Article  MATH  Google Scholar 

  73. Slocum, T.A.: Thematic Cartography and Visualization (Prentice Hall, Upper Saddle River 1999)

    Google Scholar 

  74. Spence, R.: Information Visualisation (Addison-Wesley, Harlow 2001)

    Google Scholar 

  75. Spence, R., Tweedy, L.: The Attribute Explorer: information synthesis via exploration. Interacting with Computers 11, 137–146 (1998)

    Article  Google Scholar 

  76. Spoerri, A.: InfoCrystal: a visual tool for information retrieval. In: Readings in Information Visualization: Using Vision to Think, ed. by Card, S.K., Mackinlay, J.D., Shneiderman, B. (Morgan Kaufmann, San Francisco 1999) pp 140–147

    Google Scholar 

  77. StatSoft, Inc.: Electronic Statistics Textbook (StatSoft, Tulsa 2004), http://www.statsoft.com/textbook/stathome.html. Accessed 28 Mar 2005

    Google Scholar 

  78. Stolte, C., Tang, D., Hanrahan, P.: Multiscale visualization using data cubes. In: Proceedings of the IEEE Symposium on Information Visualization 2002, InfoVis’02, Boston, USA, October 2002, ed. by Wong, P.C., Andrews, K. (IEEE Computer Society, Piscataway 2002) pp. 7–14

    Google Scholar 

  79. Tufte, E.R.: The Visual Display of Quantitative Information (Graphics Press, Cheshire CT, 1983)

    Google Scholar 

  80. Tufte, E.R.: Envisioning Information (Graphics Press, Cheshire, CT 1990)

    Google Scholar 

  81. Tukey, J.W.: Exploratory Data Analysis (Addison-Wesley, Reading, MA 1977)

    Google Scholar 

  82. Unwin, A.R., Hofmann, H.: New interactive graphics tools for exploratory analysis of spatial data. In: Innovations in GIS, Vol. 5, ed. by Carver, S. (Taylor & Francis, London 1998) pp. 46–55

    Google Scholar 

  83. Wattenberg, M. Sketching a graph to query a time-series database. In: Extended Abstracts of CHI’ 01, Seattle, March–April 2001, (ACM Press, New York 2001) pp. 379–380

    Google Scholar 

  84. Wilkinson, L.: The Grammar of Graphics (Springer-Verlag, New York 1999)

    Google Scholar 

  85. Wise, J., Thomas, J., Pennock, K., Lantrip, D., Pottier, M., Schur, A., Crow, V.: Visualizing the non-visual: spatial analysis and interaction with information from text documents. In: Proceedings of IEEE 1995 Symposium on Information Visualization, Atlanta (1995) pp. 51–58

    Google Scholar 

  86. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations (Morgan Kaufmann, San Francisco 1999)

    Google Scholar 

  87. Zenkin, A.A.: Waring’s problem from the standpoint of the cognitive interactive computer graphics. Mathematical and Computer Modelling 13(11), 9–37 (1990)

    MATH  MathSciNet  Google Scholar 

Download references

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

(2006). Tools. In: Exploratory Analysis of Spatial and Temporal Data. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-31190-4_4

Download citation

  • DOI: https://doi.org/10.1007/3-540-31190-4_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25994-7

  • Online ISBN: 978-3-540-31190-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics