Skip to main content

Preliminary Data Analysis

  • Chapter
  • First Online:
Warranty Data Collection and Analysis

Abstract

The objectives of preliminary data analysis are to edit the data to prepare it for further analysis, describe the key features of the data, and summarize the results. This chapter deals with quantitative and qualitative approaches to achieving these objectives. Topics covered include scales of measurement, types of data, graphical methods of analysisᾢincluding histograms, probability plots, and other graphical representations of data, and basic descriptive statisticsᾢmean, median, fractiles, standard deviation, and so forth. The chapter concludes with a discussion of the use of probability plots in preliminary model selection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Some other software packages, including Splus (http://www.insightful.com) and R-language (http://cran.r-project.org/) are also used in later Chapters.

  2. 2.

    In a very real sense, probability and statistics are inverses of one another. Probability deals with models of randomness that can be used to make statements about the kinds of data that may occur. Statistics deals with the use of data to make statements about the model.

  3. 3.

    Related terms are percentile and quantile.

  4. 4.

    The exception occurs if the CDF is constant over some interval and increasing on either side of the interval.

  5. 5.

    Minitab removes smallest and largest 5% (using the nearest integer to .05n). This usually removes the values causing the distortion and provides a more meaningful measure. Other (less drastic) methods of dealing with outliers will be discussed in Chap. 9.

  6. 6.

    The subscript s is for Charles Spearman, who devised the measure in 1904.

  7. 7.

    The steps may vary with respect to the version of the Minitab software.

  8. 8.

    As noted by a well-known statistician, Oscar Kempthorne, “No model is correct. But some are useful!”.

  9. 9.

    In fact, the “goodness-of-fit” statistic is given as AD* = 0.436, which indicates a relatively good fit. This will be discussed further in Chap. 10.

References

  1. Berry M (2004) Survey of text mining: clustering, classification and retrieval. Springer, New York

    MATH  Google Scholar 

  2. Blischke WR, Murthy DNP (2000) Reliability. Wiley, New York

    Book  MATH  Google Scholar 

  3. Cios KJ, Pedrycz W, Swiniarski RW, Kurgan LA (2007) Data mining: a knowledge discovery approach. Springer Science, New York

    MATH  Google Scholar 

  4. Famili A, Shen WM, Weber R, Simoudis E (1997) Data preprocessing and intelligent data analysis. Intell Data Anal 1:3ᾢ23

    Article  Google Scholar 

  5. Jeske DR, Liu RY (2007) Mining and tracking massive text data: classification, construction of tracking statistics, and inference under misclassification. Technometrics 49:116ᾢ128

    Article  MathSciNet  Google Scholar 

  6. Johnson NL, Kotz S (1970) Continuous univariate distributionᾢ1. Wiley Interscience, New York

    Google Scholar 

  7. Kim W, Choi BJ, Hong EK, Kim SO, Lee D (2003) A taxonomy of dirty data. Intell Data Anal 7:81ᾢ90

    MathSciNet  Google Scholar 

  8. Makkonen L (2008) Bringing closure to the plotting position controversy. Commun Statist Theory and Methods 37:460ᾢ467

    Article  MathSciNet  MATH  Google Scholar 

  9. Meeker WQ, Escobar LA (1998) Statistical methods for reliability data. Wiley Interscience, New York

    MATH  Google Scholar 

  10. Moore DS, McCabe GP, Craig B (2007) Introduction to the practice of statistics. W H Freeman, New York

    Google Scholar 

  11. Murthy DNP, Xie M, Jiang R (2004) Weibull models. Wiley Interscience, New York

    MATH  Google Scholar 

  12. Ryan TP (2007) Modern engineering statistics. Wiley, New York

    Book  MATH  Google Scholar 

  13. Schmid CF (1983) Statistical graphics. Wiley Interscience, New York

    Google Scholar 

  14. Tufte ER (1983) The visual display of quantitative information. Graphics Press, Cheshire, CT

    Google Scholar 

  15. Tufte ER (1989) Envisioning information. Graphics Press, Cheshire, CT

    Google Scholar 

  16. Tufte ER (1997) Visual explanations. Graphics Press, Cheshire, CT

    MATH  Google Scholar 

  17. Weibull W (1939) A Statistical theory of the strength of material. Igni²s Akademiens Handligar, Stockholm

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wallace R. Blischke .

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag London Limited

About this chapter

Cite this chapter

Blischke, W.R., Rezaul Karim, M., Prabhakar Murthy, D.N. (2011). Preliminary Data Analysis. In: Warranty Data Collection and Analysis. Springer Series in Reliability Engineering. Springer, London. https://doi.org/10.1007/978-0-85729-647-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-0-85729-647-4_8

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-0-85729-646-7

  • Online ISBN: 978-0-85729-647-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics