Empirical Analysis of Proximity Measures in Machine Learning

Hoque, Nazrul; Ahmed, Hasin A.; Bhattacharyya, Dhruba Kumar

doi:10.1007/978-981-13-9042-5_34

Empirical Analysis of Proximity Measures in Machine Learning

Nazrul Hoque¹⁹,
Hasin A. Ahmed²⁰ &
Dhruba Kumar Bhattacharyya¹⁹

Conference paper
First Online: 18 August 2019

1976 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 999))

Abstract

Availability of abundant and various types of proximity measures often projects a challenge in both supervised and unsupervised learning processes. There are various similarity and dissimilarity measures proposed in the literature of machine learning. These measures differ with respect to various issues imposed by different application domains such as ability to handle noise, ability to detect various types of correlation, and coping with large number of dimensions. In this work, we pick-up eighteen proximity measures and apply them on two well known distance-based learning frameworks. One framework uses a widely used supervised learning method, i.e., KNN classifier and the other uses an unsupervised learning method called k-means clustering.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Ahmed, H.A., Mahanta, P., Bhattacharyya, D.K., Kalita, J.K.: Shifting-and-scaling correlation based biclustering algorithm. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 11(6), 1239–1252 (2014)
Article Google Scholar
Bandyopadhyay, S., Saha, S.: Unsupervised Classification: Similarity Measures, Classical and Metaheuristic Approaches, and Applications. Springer, Berlin (2012)
Google Scholar
Bhattachayya, A.: On a measure of divergence between two statistical population defined by their population distributions. Bull. Calcutta Math. Soc. 35, 99–109 (1943)
MathSciNet Google Scholar
Bray, J.R., Curtis, J.T.: An ordination of the upland forest communities of southern wisconsin. Ecol. Monogr. 27(4), 325–349 (1957)
Article Google Scholar
Cantrell, C.D.: Modern Mathematical Methods for Physicists and Engineers. Cambridge University Press, Cambridge (2000)
Google Scholar
Chowdhury, H.A., Bhattacharyya, D.K.: mRMR+: An effective feature selection algorithm for classification. In: International Conference on Pattern Recognition and Machine Intelligence, pp. 424–430. Springer (2017)
Google Scholar
Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240. ACM (2006)
Google Scholar
Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)
Article Google Scholar
Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)
Article Google Scholar
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Elsevier, New York (2011)
Chapter MATH Google Scholar
Hoque, N., Bhattacharyya, D., Kalita, J.K.: MIFS-ND: a mutual information-based feature selection method. Expert. Syst. Appl. 41(14), 6371–6385 (2014)
Article Google Scholar
Jurman, G., Riccadonna, S., Visintainer, R., Furlanello, C.: Canberra distance on ranked lists. In: Proceedings, Advances in Ranking–NIPS 09 Workshop, pp. 22–27 (2009)
Google Scholar
Kendall, M.G.: A new measure of rank correlation. Biometrika 30, 81–93 (1938)
Article MATH Google Scholar
Kraskov, A., Stögbauer, H., Andrzejak, R.G., Grassberger, P.: Hierarchical clustering using mutual information. EPL (Eur. Lett.) 70(2), 278 (2005)
Article MathSciNet Google Scholar
Kruskal, J.B.: Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29(1), 1–27 (1964)
Article MathSciNet MATH Google Scholar
Lehman, A.: JMP for Basic Univariate and Multivariate Statistics: A Step-by-Step Guide. SAS Institute, Cary (2005)
Google Scholar
Levandowsky, M., Winter, D.: Distance between sets. Nature 234(5323), 34–35 (1971)
Article Google Scholar
Mahanta, P., Ahmed, H.A., Bhattacharyya, D.K., Kalita, J.K.: An effective method for network module extraction from microarray data. BMC Bioinform. 13(Suppl 13), S4 (2012)
Article Google Scholar
Pearson, K.: Note on regression and inheritance in the case of two parents. In: Proceedings of the Royal Society of London, pp. 240–242 (1895)
Google Scholar
Sherali, H.D., Tuncbilek, C.H.: A squared-euclidean distance location-allocation problem. Nav. Res. Logist. (NRL) 39(4), 447–469 (1992)
Article MathSciNet MATH Google Scholar
Wu, H., Gao, L., Dong, J., Yang, X.: Detecting overlapping protein complexes by rough-fuzzy clustering in protein-protein interaction networks. PloS one 9(3), e91856 (2014)
Article Google Scholar
Wu, M., Li, X., Kwoh, C.K., Ng, S.K.: A core-attachment based method to detect protein complexes in PPI networks. BMC Bioinform. 10(1), 1 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of CSE, Tezpur University, Sonitpur, 784028, Assam, India
Nazrul Hoque & Dhruba Kumar Bhattacharyya
Department of Information and Computer Science, Assam Women’s University, Jorhat, 785004, Assam, India
Hasin A. Ahmed

Authors

Nazrul Hoque
View author publications
You can also search for this author in PubMed Google Scholar
Hasin A. Ahmed
View author publications
You can also search for this author in PubMed Google Scholar
Dhruba Kumar Bhattacharyya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nazrul Hoque .

Editor information

Editors and Affiliations

Department of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Howrah, West Bengal, India
Asit Kumar Das
Department of Computer Science and Engineering, Sri Sivani College of Engineering, Srikakulam, Andhra Pradesh, India
Janmenjoy Nayak
Department of Computer Application, Veer Surendra Sai University of Technology, Burla, Sambalpur, Odisha, India
Bighnaraj Naik
Department of Bioinformatics, Maulana Abul Kalam Azad University of Technology, Kolkata, West Bengal, India
Soumen Kumar Pati
Faculty of Communication Sciences, University of Teramo, Teramo, Italy
Danilo Pelusi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hoque, N., Ahmed, H.A., Bhattacharyya, D.K. (2020). Empirical Analysis of Proximity Measures in Machine Learning. In: Das, A., Nayak, J., Naik, B., Pati, S., Pelusi, D. (eds) Computational Intelligence in Pattern Recognition. Advances in Intelligent Systems and Computing, vol 999. Springer, Singapore. https://doi.org/10.1007/978-981-13-9042-5_34

Download citation

DOI: https://doi.org/10.1007/978-981-13-9042-5_34
Published: 18 August 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9041-8
Online ISBN: 978-981-13-9042-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics