Abstract
Instance-based learning techniques use a set of stored training instances to classify new examples. The most common such learning technique is the nearest neighbor method, in which new instances are classified according to the closest training instance. A critical element of any such method is the metric used to determine distance between instances. Euclidean distance is by far the most commonly used metric; no one, however, has systematically considered whether a different metric, such as Manhattan distance, might perform equally well on naturally occurring data sets. Some evidence from psychological research indicates that Manhattan distance might be preferable in some circumstances. This paper examines three different distance metrics and presents experimental comparisons using data from three domains: malignant cancer classification, heart disease diagnosis, and diabetes prediction. The results of these studies indicate that the Manhattan distance metric works works quite well, although not better than the Euclidean metric that has become a standard for machine learning experiments. Because the nearest neighbor technique provides a good benchmark for comparisons with other learning algorithms, the results below include a number of such comparisons, which show that nearest neighbor, using any distance metric, compares quite well to other machine learning techniques.
Supported in part by the Air Force Office of Scientific Research under Grant AFOSR-89-0151.
Preview
Unable to display preview. Download preview PDF.
References
Aha, D. & D. Kibler (1989) Noise-tolerant instance-based learning algorithms. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 794–799) Detroit, MI: Morgan Kaufmann.
Aha, D., D. Kibler, D. & M. Albert (1991) Instance-based learning algorithms. Machine Learning, 6:1.
Bennett, P., T. Burch, and M. Miller (1971) Diabetes mellitus in American (Pima) Indians. Lancet, 2, 125–128.
Cost, S. and S. Salzberg (1990) Exemplar-based learning to predict protein folding. Proceedings of the 1990 Symposium on Computer Applications in Medical Care, Washington, D.C., November 1990.
Cover, T. & P. Hart (1967). Nearest Neighbor Pattern Classification. IEEE Transactions on Information Theory, 13, 21–27.
Gennari, J., P. Langley, & D. Fisher (1989) Models of incremental concept formation. Artificial Intelligence, 40, 11–61.
Mangasarian, O., R. Setiono, and W. Wolberg (1989) Pattern recognition via linear programming: theory and application to medical diagnosis. Technical Report #878, Computer Sciences Department, University of Wisconsin-Madison, Sept. 1989.
Medin, D. and M. Schaffer (1978) Context theory of classification learning. Psychological Review, 85:3, 207–238.
Nosofsky, R. (1984) Choice, similarity, and the context theory of classification. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10:1, 104–114.
Quinlan, J. R. (1986) Induction of decision trees. Machine Learning 1, 81–106.
Rumelhart, D., G. Hinton, and R. Williams (1986) Learning representations by back-propagating errors. Nature 323:9, 533–536.
Salzberg, S. (1991) A nearest hyperrectangle learning method. Machine Learning, 6:3, 251–276.
Smith, J., J. Everhart, W. Dickson, W. Knowler, and R. Johannes (1988) Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus. Proceedings of the 1988 Symposium on Computer Applications in Medical Care, Washington, D.C., 261–265.
Stanfill, C. and D. Waltz (1986) Toward memory-based reasoning. Communications of the ACM, 29:12, 1213–1228.
Tversky, A. and I. Gati (1982) Similarity, separability, and the triangle inequality. Psychological Review, 89, 123–154.
Wolberg, W. and O. Mangasarian (1989) Multisurface method of pattern separation applied to breast cytology diagnosis. Manuscript, Department of Surgery, Clinical Science Center, University of Wisconsin, Madison, WI.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1991 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Salzberg, S. (1991). Distance metrics for instance-based learning. In: Ras, Z.W., Zemankova, M. (eds) Methodologies for Intelligent Systems. ISMIS 1991. Lecture Notes in Computer Science, vol 542. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-54563-8_103
Download citation
DOI: https://doi.org/10.1007/3-540-54563-8_103
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-54563-7
Online ISBN: 978-3-540-38466-3
eBook Packages: Springer Book Archive