Abstract
In this chapter, we assume that the scientific question of interest has been clearly defined, the study has been designed, and the data have been collected from randomly selected members of the population. Our objective is then to obtain a high-level understanding of the data through summary statistics and data visualization techniques. The focus of this chapter is on exploring one variable at a time regardless of any possible relationships between those variables. (Exploring the relationships among variables is discussed in the next chapter.) Here, we discuss different variable types. More specifically, we divide variables into categorical and numerical. Distinguishing these two types of variables is important because the summary statistics and data visualization techniques appropriate for a variable usually depend on the type of that variable. This chapter also provides some discussion on data preprocessing.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Hand, D.J., Daly, F., McConway, K., Lunn, D., Ostrowski, E.: A Handbook of Small Data Sets, 1st edn. Chapman & Hall Statistics Texts. Chapman and Hall/CRC, London (1993)
Houchens, R.L., Schoeps, N.: Comparison of hospital length of stay between two insurers for patients with pediatric asthma. In: Peck, L.H.R., Goodman, A. (eds.) Statistical Case Studies: A Collaboration Between Academe and Industry, pp. 45–64. The American Statistical Society, and the Society for Industrial and Applied Mathematics, Philadelphia (1998)
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, 2nd edn. Wiley-Interscience, New York (2002)
Sturges, H.A.: The choice of a class interval. Am. Stat. Assoc. 21, 65–66 (1926)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Shahbaba, B. (2012). Data Exploration. In: Biostatistics with R. Use R!. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1302-8_2
Download citation
DOI: https://doi.org/10.1007/978-1-4614-1302-8_2
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1301-1
Online ISBN: 978-1-4614-1302-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)