Preliminary Data Analysis

Blischke, Wallace R.; Rezaul Karim, M.; Prabhakar Murthy, D. N.

doi:10.1007/978-0-85729-647-4_8

Wallace R. Blischke⁴,
M. Rezaul Karim⁵ &
D. N. Prabhakar Murthy⁶

Part of the book series: Springer Series in Reliability Engineering ((RELIABILITY))

4235 Accesses

Abstract

The objectives of preliminary data analysis are to edit the data to prepare it for further analysis, describe the key features of the data, and summarize the results. This chapter deals with quantitative and qualitative approaches to achieving these objectives. Topics covered include scales of measurement, types of data, graphical methods of analysisᾢincluding histograms, probability plots, and other graphical representations of data, and basic descriptive statisticsᾢmean, median, fractiles, standard deviation, and so forth. The chapter concludes with a discussion of the use of probability plots in preliminary model selection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Some other software packages, including Splus (http://www.insightful.com) and R-language (http://cran.r-project.org/) are also used in later Chapters.
2.
In a very real sense, probability and statistics are inverses of one another. Probability deals with models of randomness that can be used to make statements about the kinds of data that may occur. Statistics deals with the use of data to make statements about the model.
3.
Related terms are percentile and quantile.
4.
The exception occurs if the CDF is constant over some interval and increasing on either side of the interval.
5.
Minitab removes smallest and largest 5% (using the nearest integer to .05n). This usually removes the values causing the distortion and provides a more meaningful measure. Other (less drastic) methods of dealing with outliers will be discussed in Chap. 9.
6.
The subscript s is for Charles Spearman, who devised the measure in 1904.
7.
The steps may vary with respect to the version of the Minitab software.
8.
As noted by a well-known statistician, Oscar Kempthorne, “No model is correct. But some are useful!”.
9.
In fact, the “goodness-of-fit” statistic is given as AD* = 0.436, which indicates a relatively good fit. This will be discussed further in Chap. 10.

References

Berry M (2004) Survey of text mining: clustering, classification and retrieval. Springer, New York
MATH Google Scholar
Blischke WR, Murthy DNP (2000) Reliability. Wiley, New York
Book MATH Google Scholar
Cios KJ, Pedrycz W, Swiniarski RW, Kurgan LA (2007) Data mining: a knowledge discovery approach. Springer Science, New York
MATH Google Scholar
Famili A, Shen WM, Weber R, Simoudis E (1997) Data preprocessing and intelligent data analysis. Intell Data Anal 1:3ᾢ23
Article Google Scholar
Jeske DR, Liu RY (2007) Mining and tracking massive text data: classification, construction of tracking statistics, and inference under misclassification. Technometrics 49:116ᾢ128
Article MathSciNet Google Scholar
Johnson NL, Kotz S (1970) Continuous univariate distributionᾢ1. Wiley Interscience, New York
Google Scholar
Kim W, Choi BJ, Hong EK, Kim SO, Lee D (2003) A taxonomy of dirty data. Intell Data Anal 7:81ᾢ90
MathSciNet Google Scholar
Makkonen L (2008) Bringing closure to the plotting position controversy. Commun Statist Theory and Methods 37:460ᾢ467
Article MathSciNet MATH Google Scholar
Meeker WQ, Escobar LA (1998) Statistical methods for reliability data. Wiley Interscience, New York
MATH Google Scholar
Moore DS, McCabe GP, Craig B (2007) Introduction to the practice of statistics. W H Freeman, New York
Google Scholar
Murthy DNP, Xie M, Jiang R (2004) Weibull models. Wiley Interscience, New York
MATH Google Scholar
Ryan TP (2007) Modern engineering statistics. Wiley, New York
Book MATH Google Scholar
Schmid CF (1983) Statistical graphics. Wiley Interscience, New York
Google Scholar
Tufte ER (1983) The visual display of quantitative information. Graphics Press, Cheshire, CT
Google Scholar
Tufte ER (1989) Envisioning information. Graphics Press, Cheshire, CT
Google Scholar
Tufte ER (1997) Visual explanations. Graphics Press, Cheshire, CT
MATH Google Scholar
Weibull W (1939) A Statistical theory of the strength of material. Igni²s Akademiens Handligar, Stockholm
Google Scholar

Download references

Author information

Authors and Affiliations

5401 Katherine Avenue, 91401-4922, Sherman Oaks, Los Angeles, CA, USA
Wallace R. Blischke
Department of Statistics, Rajshahi University, Rajshahi, Bangladesh
M. Rezaul Karim
School of Mechanical and Mining Engineering, The University of Queensland, Brisbane, QLD, 4072, Australia
D. N. Prabhakar Murthy

Authors

Wallace R. Blischke
View author publications
You can also search for this author in PubMed Google Scholar
M. Rezaul Karim
View author publications
You can also search for this author in PubMed Google Scholar
D. N. Prabhakar Murthy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wallace R. Blischke .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Blischke, W.R., Rezaul Karim, M., Prabhakar Murthy, D.N. (2011). Preliminary Data Analysis. In: Warranty Data Collection and Analysis. Springer Series in Reliability Engineering. Springer, London. https://doi.org/10.1007/978-0-85729-647-4_8

Download citation

DOI: https://doi.org/10.1007/978-0-85729-647-4_8
Published: 27 July 2011
Publisher Name: Springer, London
Print ISBN: 978-0-85729-646-7
Online ISBN: 978-0-85729-647-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics