Abstract
This chapter introduces what we believe is a novel approach, allowing a trained classifier system to “know what it doesn’t know” and to use this procedure to allow predictions for new cases to be flagged as unreliable if they lie in a region of feature space where the classifier knows its knowledge is likely to be faulty. We show how this approach may also be used to compare alternative classifiers trained on the same data in terms of their uncertainly areas. The better classifier is the one with the smaller uncertainty area. We illustrate three approaches to define these uncertainty areas, and we are quite sure additional research may identify newer and likely better ways. This work is significantly less mature than that of previous chapters; it represents a very recent insight, and the methods described are quite heuristic. We look forward to others taking this approach in more rigorous directions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
For instance, see Fig. 5.15. This example scatter plot for some of our speech data illustrates how shifting the threshold up or down may alter the numbers of false positives or negatives.
- 2.
Even if the TM sets are the same for two different classifiers, their uncertainty areas may still be different. Their estimate errors on each case may well be different. However, when differences start to get small, one may question how accurate are any MOPs derived from a modest data set.
- 3.
The careful reader may count seven visible polygons in Fig. 9.11. There are actually two that have been obscured. They belong to the two NL cases whose blue solid circles are larger than these two small polygons, and thus obscure them.
- 4.
We used the spline function in R (stats package) and had it return the estimated cubic splines.
Abbreviations
- 2D:
-
Two dimensional
- GA-SVM:
-
Genetic algorithm-support vector machine hybrid
- GRNN:
-
Generalized regression neural network
- MMSE:
-
Mini-mental state exam
- MOP:
-
Measure of performance
- ROC:
-
Receiver operator characteristic
- SNE:
-
Stochastic neighborhood embedding
- TM:
-
Trouble-Makers: training cases a learning classifier gets wrong
- t-SNE:
-
Student’s t-distribution SNE
References
Donaldson J (2016) An R package for t-SNE (t-Distributed Stochastic Neighbor Embedding), GITHUB, last commit 2016. https://github.com/jdonaldson/rtsne/
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Shepard R (1962) The analysis of proximities: multidimensional scaling with an unknown distance function (parts 1 and 2). Psychometrika 27:125–140, 219–249
van der Maaten L, Hinton GE (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Land, W.H., Schaffer, J.D. (2020). Quantifying Uncertainty. In: The Art and Science of Machine Intelligence. Springer, Cham. https://doi.org/10.1007/978-3-030-18496-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-18496-4_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18495-7
Online ISBN: 978-3-030-18496-4
eBook Packages: EngineeringEngineering (R0)