Data exploration

  • G. Barrie Wetherill
  • P. Duncombe
  • M. Kenward
  • J. Köllerström
  • S. R. Paul
  • B. J. Vowden
Part of the Monographs on Statistics and Applied Probability book series (MSAP)


The rôle of exploratory data analysis is to reveal the features of the data set under study. All statistical analyses should include a thorough exploration of the data. For some sets of data this may just show a complete lack of unusual or interesting aspects. For others interesting observations, or groups of observations, unexpected structure or relationships may be found in the data. Failure to explore the data before embarking on a formal statistical analysis may cause much time, effort and resources to be wasted, or even worse, incorrect conclusions to be drawn.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Anderson, E. (1960) A semigraphical method for the analysis of complex problems. Technometrics, 2, 387–391.CrossRefGoogle Scholar
  2. Andrews, D. F. (1972) Plots of high-dimensional data. Biometrics, 28, 125–136.CrossRefGoogle Scholar
  3. Atkinson, A. C. (1982) Regression diagnostics, transformations and constructed variables. J. Roy. Statist. Soc. B, 42, 1–36.Google Scholar
  4. Barnett, V. and Lewis, T. (1978) Outliers in Statistical Data. Wiley, Chichester.Google Scholar
  5. Belsley, D. A. Kuh, E. and Welsch, R. E. (1980) Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. Wiley, New York.CrossRefGoogle Scholar
  6. Box, G. E. P. and Jenkins, G. M. (1976) Time Series Analysis, Forecasting and Control, 2nd edn. Holden-Day, San Francisco.Google Scholar
  7. Chatfield, C. (1975) The Analysis of Time Series: Theory and Practice. Chapman and Hall, London.CrossRefGoogle Scholar
  8. Chernoff, H. (1973) Using faces to represent points in K-dimensional space graphically. J.A.S.A., 68, 361–368.Google Scholar
  9. Cleveland, W. S. (1982) A reader’s guide to smoothing scatterplots and graphical methods for regression, in Modern Data Analysis (eds R. L. Launer and A. F. Siegel ). Academic Press, New York, pp. 37–43.Google Scholar
  10. Cleveland, W. S. and McGill, R. (1983) A colour-caused optical illusion on a statistical graph. Amer. Statist., 37, 101–105.Google Scholar
  11. Cormack, R. M. (1971) A review of classification. J. Roy. Statist. Soc. A, 134, 321–367.Google Scholar
  12. Cox, D. R. (1978) Some remarks on the roll in statistics of graphical methods. Appl. Statist., 27, 4–9.CrossRefGoogle Scholar
  13. Everitt, B. S. (1974) Cluster Analysis. Heineman, London.Google Scholar
  14. Feder, P. I. (1974) Graphical techniques in statistical data analysis–Tools for extracting information from data. Technometrics, 16, 287–299.Google Scholar
  15. Fienberg, S. E. (1979) Graphical methods in statistics, Amer. Statist., 33, 165–178.Google Scholar
  16. Huff, D. (1973) How of Lie with Statistics. Penguin, Hardmondsworth.Google Scholar
  17. Jones, B. (1979) Cluster analysis of some social survey data. Bull Appl. Statist., 6, 25–56.CrossRefGoogle Scholar
  18. McGill, R., Tukey, J. W. and Larsen, W. A. (1978) Variations of box plots. Amer. Statist., 32, 12–16.Google Scholar
  19. Meeker, W. Q., Hahn, G. J. and Feder, P. I. (1975) A computer program for evaluating and comparing experimental designs and some applications. Amer. Statist., 29, 60–64.Google Scholar
  20. Meeker, W. Q., Hahn, G. J. and Feder, P. I. (1977) New bias evaluation features of EXPLOR - A program for assessing experimental design properties. Amer. Statist., 31, 95–96.CrossRefGoogle Scholar
  21. Mosteller, F. and Tukey, J. W. (1977) Data Analysis and Regression. Addison-Wesley, Reading, MA.Google Scholar
  22. Pickett, R. and White, B. W. (1966) Constructing data pictures. Proceedings of the Seventh National Symposium of the Society for Information Display, pp. 75–81.Google Scholar
  23. Siegel, J. H., Goldwyn, R. M. and Friedman, H. P. (1971) Pattern and process of the evolution of human septic shock. Surgery, 70, 232–245.Google Scholar
  24. Stover, H. S. (1981) Terminal puts three-dimensional graphics on solid ground. Electronics, July, 150–155.Google Scholar
  25. Tukey, J. W. (1977) Exploratory Data Analysis. Addison-Wesley, Reading, MA.Google Scholar
  26. Tukey, J. W. (1980) We need both exploratory and confirmatory. Amer. Statist., 34, 23–25.Google Scholar
  27. Velleman, P. F. and Hoaglin, D. C. (1981) Applications, Basics and Computing of Exploratory Data Analysis. Duxbury Press, Boston.Google Scholar
  28. Wainer, H. (1984) How to display data badly. Amer. Statist., 38, 137–147.Google Scholar
  29. Wakimoto, K. and Taguri, M. (1978) Constellation graphical method for representing multi-dimensional data. Ann. Inst. Statist. Math., 30, 97–104.CrossRefGoogle Scholar
  30. Wetherill, G. B. (1982) Intermediate Statistical Methods Chapman and Hall, London.Google Scholar

Further reading

  1. Anderson, T. W. (1958) Introduction to Multivariate Statistical Analysis. Wiley, New York.Google Scholar
  2. Anscombe, F. J. (1973) Graphs in statistical analysis. Amer. Statist., 27, 17–21.Google Scholar
  3. Kendall, M. G. (1957) A Course in Multivariate Analysis. Charles Griffin, London.Google Scholar
  4. Morrison, D. F. (1976) Multivariate Statistical Methods, 2nd edn McGraw-Hill, New York.Google Scholar

Copyright information

© G. Barrie Wetherill 1986

Authors and Affiliations

  • G. Barrie Wetherill
    • 1
  • P. Duncombe
    • 2
  • M. Kenward
    • 3
  • J. Köllerström
    • 3
  • S. R. Paul
    • 4
  • B. J. Vowden
    • 3
  1. 1.Department of StatisticsThe University of Newcastle upon TyneUK
  2. 2.Applied Statistics Research UnitUniversity of Kent at CanterburyUK
  3. 3.Mathematical InstituteUniversity of Kent at CanterburyUK
  4. 4.Department of Mathematics and StatisticsUniversity of WindsorCanada

Personalised recommendations