Skip to main content

Data Exploration

  • Chapter
  • 25k Accesses

Part of the book series: Use R! ((USE R))

Abstract

In this chapter, we assume that the scientific question of interest has been clearly defined, the study has been designed, and the data have been collected from randomly selected members of the population. Our objective is then to obtain a high-level understanding of the data through summary statistics and data visualization techniques. The focus of this chapter is on exploring one variable at a time regardless of any possible relationships between those variables. (Exploring the relationships among variables is discussed in the next chapter.) Here, we discuss different variable types. More specifically, we divide variables into categorical and numerical. Distinguishing these two types of variables is important because the summary statistics and data visualization techniques appropriate for a variable usually depend on the type of that variable. This chapter also provides some discussion on data preprocessing.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Hand, D.J., Daly, F., McConway, K., Lunn, D., Ostrowski, E.: A Handbook of Small Data Sets, 1st edn. Chapman & Hall Statistics Texts. Chapman and Hall/CRC, London (1993)

    Google Scholar 

  2. Houchens, R.L., Schoeps, N.: Comparison of hospital length of stay between two insurers for patients with pediatric asthma. In: Peck, L.H.R., Goodman, A. (eds.) Statistical Case Studies: A Collaboration Between Academe and Industry, pp. 45–64. The American Statistical Society, and the Society for Industrial and Applied Mathematics, Philadelphia (1998)

    Google Scholar 

  3. Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, 2nd edn. Wiley-Interscience, New York (2002)

    MATH  Google Scholar 

  4. Sturges, H.A.: The choice of a class interval. Am. Stat. Assoc. 21, 65–66 (1926)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Babak Shahbaba .

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Shahbaba, B. (2012). Data Exploration. In: Biostatistics with R. Use R!. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1302-8_2

Download citation

Publish with us

Policies and ethics