Abstract
In this paper we present an account of the main features of Snout, an intelligent assistant for exploratory data analysis (EDA) of social science survey data that incorporates a range of data mining techniques. EDA has much in common with existing data mining techniques: its main objective is to help an investigator reach an understanding of the important relationships in a data set rather than simply develop predictive models for selected variables. Brief descriptions of a number of novel techniques developed for use in Snout are presented. These include heuristic variable level inference and classification, automatic category formation, the use of similarity trees to identify groups of related variables, interactive decision tree construction, and model selection using a genetic algorithm.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth, Pacific Grove, CA., 1984.
J. A. Davis. Elementary Survey Analysis. Prentice-Hall, Englewood Cliffs, New Jersey, 1971.
J. A. Davis. The Logic of Causal Order. Sage Publications, Newbury Park, CA, 1985.
L. Davis. A Genetic Algorithms Tutorial. In L. Davis, editor, Handbook of Genetic Algorithms, pages 1–101. Van Nostrand Reinhold, New York, 1991.
J. Dougherty, R. Kohavi, and M. Sahami. Supervised and unsupervised discretisation of continuous features. In Proc. Twelfth International Conference on Machine Learning, Los Altos, CA, 1995. Morgan Kaufman Publ. Inc.
B.H. Erickson and T.A. Nosanchuk. Understanding Data. The Open University Press, 1979.
B. S. Everitt. Cluster Analysis. Heinemann, London, 2nd edition, 1980.
U. M. Fayyad, P. Piatetsky-Shapiro, and P. Smyth. From Data Mining to Knowledge Discovery. In Advances in Knowledge Discovery and Data Mining, pages 1–34. The MIT Press, Cambridge, Mass., 1996.
J. Fox. Linear Statistical Models and Related Methods. John Wiley & Sons, New York, 1984.
W. J. Frawley, G. Piatetsky-Shapiro, and C. J. Matheus. Knowledge discovery in databases: An overview. In G. Piatetsky-Shapiro and W. Frawley, editors, Knowledge Discovery in Databases, pages 1–27. AAAI Press/MIT Press, Menlo Park, CA.,/Cambridge, MA., 1991.
D. E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Reading, Mass., 1989.
J. Healey. Statistics: A Tool For Social Research. Wadsworth, Belmont, CA., 1990.
K. M. Ho and P. D. Scott. Discretization of continuous variables in bivariate relationships. Technical Report CSM-287, Dept. of Computer Science, University of Essex, Colchester, UK, February 1997.
J. H. Holland. Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, 1975.
C. A. O'Muircheartaigh and C. Payne. The Analysis of Survey Data. Volume 1: Exploring Data Structures. John Wiley & Sons, New York, 1977.
J. R. Quinlan. Induction of decision trees. Machine Learning, 1:81–106, 1986.
J. R. Quinlan. Programs for Machine Learning. Morgan Kaufman Publ. Inc., Los Altos, CA, 1993.
P. D. Scott, M. H. Hobbs, R. J. Williams, and A. P. M. Coxon. Exploratory analysis using a genetic algorithm for multiple regression. Technical Report CSM-288, Dept. of Computer Science, University of Essex, Colchester, UK, February 1997.
P. D. Scott, R. J. Williams, and K. M. Ho. Forming categories in exploratory data analysis and data mining. Technical Report CSM-285, Dept. of Computer Science, University of Essex, Colchester, UK, February 1997.
J. W. Tukey. Exploratory Data Analysis. Addison-Wesley, Reading, Mass., 1977.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Scott, P.D., Coxon, A.P.M., Hobbs, M.H., Williams, R.J. (1997). SNOUT: An intelligent assistant for exploratory data analysis. In: Komorowski, J., Zytkow, J. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 1997. Lecture Notes in Computer Science, vol 1263. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63223-9_118
Download citation
DOI: https://doi.org/10.1007/3-540-63223-9_118
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63223-8
Online ISBN: 978-3-540-69236-2
eBook Packages: Springer Book Archive