Abstract
Suppose you have evaluated a classifier’s performance on an independent testing set. To what extent can you trust your findings? When a flipped coin comes up heads eight times out of ten, any reasonable experimenter will suspect this to be nothing but a fluke, expecting that another set of ten tosses will give a result closer to reality. Similar caution is in place when measuring classification performance. To evaluate classification accuracy on a testing set is not enough; just as important is to develop some notion of the chances that the measured value is a reliable estimate of the classifier’s true behavior.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
As explained in Sect. 12.1 in connection with the distribution of results obtained from different samples, we prefer the term standard error to the more general standard deviation.
- 3.
With more degrees of freedom, the curve would get closer to the normal distribution, becoming almost indistinguishable from it for 30 or more degrees of freedom.
References
Dietterich, T. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10, 1895–1923.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Kubat, M. (2017). Statistical Significance. In: An Introduction to Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-63913-0_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-63913-0_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63912-3
Online ISBN: 978-3-319-63913-0
eBook Packages: Computer ScienceComputer Science (R0)