Abstract
Availability of abundant and various types of proximity measures often projects a challenge in both supervised and unsupervised learning processes. There are various similarity and dissimilarity measures proposed in the literature of machine learning. These measures differ with respect to various issues imposed by different application domains such as ability to handle noise, ability to detect various types of correlation, and coping with large number of dimensions. In this work, we pick-up eighteen proximity measures and apply them on two well known distance-based learning frameworks. One framework uses a widely used supervised learning method, i.e., KNN classifier and the other uses an unsupervised learning method called k-means clustering.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Ahmed, H.A., Mahanta, P., Bhattacharyya, D.K., Kalita, J.K.: Shifting-and-scaling correlation based biclustering algorithm. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 11(6), 1239–1252 (2014)
Bandyopadhyay, S., Saha, S.: Unsupervised Classification: Similarity Measures, Classical and Metaheuristic Approaches, and Applications. Springer, Berlin (2012)
Bhattachayya, A.: On a measure of divergence between two statistical population defined by their population distributions. Bull. Calcutta Math. Soc. 35, 99–109 (1943)
Bray, J.R., Curtis, J.T.: An ordination of the upland forest communities of southern wisconsin. Ecol. Monogr. 27(4), 325–349 (1957)
Cantrell, C.D.: Modern Mathematical Methods for Physicists and Engineers. Cambridge University Press, Cambridge (2000)
Chowdhury, H.A., Bhattacharyya, D.K.: mRMR+: An effective feature selection algorithm for classification. In: International Conference on Pattern Recognition and Machine Intelligence, pp. 424–430. Springer (2017)
Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240. ACM (2006)
Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)
Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Elsevier, New York (2011)
Hoque, N., Bhattacharyya, D., Kalita, J.K.: MIFS-ND: a mutual information-based feature selection method. Expert. Syst. Appl. 41(14), 6371–6385 (2014)
Jurman, G., Riccadonna, S., Visintainer, R., Furlanello, C.: Canberra distance on ranked lists. In: Proceedings, Advances in Ranking–NIPS 09 Workshop, pp. 22–27 (2009)
Kendall, M.G.: A new measure of rank correlation. Biometrika 30, 81–93 (1938)
Kraskov, A., Stögbauer, H., Andrzejak, R.G., Grassberger, P.: Hierarchical clustering using mutual information. EPL (Eur. Lett.) 70(2), 278 (2005)
Kruskal, J.B.: Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29(1), 1–27 (1964)
Lehman, A.: JMP for Basic Univariate and Multivariate Statistics: A Step-by-Step Guide. SAS Institute, Cary (2005)
Levandowsky, M., Winter, D.: Distance between sets. Nature 234(5323), 34–35 (1971)
Mahanta, P., Ahmed, H.A., Bhattacharyya, D.K., Kalita, J.K.: An effective method for network module extraction from microarray data. BMC Bioinform. 13(Suppl 13), S4 (2012)
Pearson, K.: Note on regression and inheritance in the case of two parents. In: Proceedings of the Royal Society of London, pp. 240–242 (1895)
Sherali, H.D., Tuncbilek, C.H.: A squared-euclidean distance location-allocation problem. Nav. Res. Logist. (NRL) 39(4), 447–469 (1992)
Wu, H., Gao, L., Dong, J., Yang, X.: Detecting overlapping protein complexes by rough-fuzzy clustering in protein-protein interaction networks. PloS one 9(3), e91856 (2014)
Wu, M., Li, X., Kwoh, C.K., Ng, S.K.: A core-attachment based method to detect protein complexes in PPI networks. BMC Bioinform. 10(1), 1 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Hoque, N., Ahmed, H.A., Bhattacharyya, D.K. (2020). Empirical Analysis of Proximity Measures in Machine Learning. In: Das, A., Nayak, J., Naik, B., Pati, S., Pelusi, D. (eds) Computational Intelligence in Pattern Recognition. Advances in Intelligent Systems and Computing, vol 999. Springer, Singapore. https://doi.org/10.1007/978-981-13-9042-5_34
Download citation
DOI: https://doi.org/10.1007/978-981-13-9042-5_34
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9041-8
Online ISBN: 978-981-13-9042-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)