Skip to main content

Empirical Analysis of Proximity Measures in Machine Learning

  • Conference paper
  • First Online:
  • 1976 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 999))

Abstract

Availability of abundant and various types of proximity measures often projects a challenge in both supervised and unsupervised learning processes. There are various similarity and dissimilarity measures proposed in the literature of machine learning. These measures differ with respect to various issues imposed by different application domains such as ability to handle noise, ability to detect various types of correlation, and coping with large number of dimensions. In this work, we pick-up eighteen proximity measures and apply them on two well known distance-based learning frameworks. One framework uses a widely used supervised learning method, i.e., KNN classifier and the other uses an unsupervised learning method called k-means clustering.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Ahmed, H.A., Mahanta, P., Bhattacharyya, D.K., Kalita, J.K.: Shifting-and-scaling correlation based biclustering algorithm. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 11(6), 1239–1252 (2014)

    Article  Google Scholar 

  2. Bandyopadhyay, S., Saha, S.: Unsupervised Classification: Similarity Measures, Classical and Metaheuristic Approaches, and Applications. Springer, Berlin (2012)

    Google Scholar 

  3. Bhattachayya, A.: On a measure of divergence between two statistical population defined by their population distributions. Bull. Calcutta Math. Soc. 35, 99–109 (1943)

    MathSciNet  Google Scholar 

  4. Bray, J.R., Curtis, J.T.: An ordination of the upland forest communities of southern wisconsin. Ecol. Monogr. 27(4), 325–349 (1957)

    Article  Google Scholar 

  5. Cantrell, C.D.: Modern Mathematical Methods for Physicists and Engineers. Cambridge University Press, Cambridge (2000)

    Google Scholar 

  6. Chowdhury, H.A., Bhattacharyya, D.K.: mRMR+: An effective feature selection algorithm for classification. In: International Conference on Pattern Recognition and Machine Intelligence, pp. 424–430. Springer (2017)

    Google Scholar 

  7. Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240. ACM (2006)

    Google Scholar 

  8. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)

    Article  Google Scholar 

  9. Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)

    Article  Google Scholar 

  10. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Elsevier, New York (2011)

    Chapter  MATH  Google Scholar 

  11. Hoque, N., Bhattacharyya, D., Kalita, J.K.: MIFS-ND: a mutual information-based feature selection method. Expert. Syst. Appl. 41(14), 6371–6385 (2014)

    Article  Google Scholar 

  12. Jurman, G., Riccadonna, S., Visintainer, R., Furlanello, C.: Canberra distance on ranked lists. In: Proceedings, Advances in Ranking–NIPS 09 Workshop, pp. 22–27 (2009)

    Google Scholar 

  13. Kendall, M.G.: A new measure of rank correlation. Biometrika 30, 81–93 (1938)

    Article  MATH  Google Scholar 

  14. Kraskov, A., Stögbauer, H., Andrzejak, R.G., Grassberger, P.: Hierarchical clustering using mutual information. EPL (Eur. Lett.) 70(2), 278 (2005)

    Article  MathSciNet  Google Scholar 

  15. Kruskal, J.B.: Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29(1), 1–27 (1964)

    Article  MathSciNet  MATH  Google Scholar 

  16. Lehman, A.: JMP for Basic Univariate and Multivariate Statistics: A Step-by-Step Guide. SAS Institute, Cary (2005)

    Google Scholar 

  17. Levandowsky, M., Winter, D.: Distance between sets. Nature 234(5323), 34–35 (1971)

    Article  Google Scholar 

  18. Mahanta, P., Ahmed, H.A., Bhattacharyya, D.K., Kalita, J.K.: An effective method for network module extraction from microarray data. BMC Bioinform. 13(Suppl 13), S4 (2012)

    Article  Google Scholar 

  19. Pearson, K.: Note on regression and inheritance in the case of two parents. In: Proceedings of the Royal Society of London, pp. 240–242 (1895)

    Google Scholar 

  20. Sherali, H.D., Tuncbilek, C.H.: A squared-euclidean distance location-allocation problem. Nav. Res. Logist. (NRL) 39(4), 447–469 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  21. Wu, H., Gao, L., Dong, J., Yang, X.: Detecting overlapping protein complexes by rough-fuzzy clustering in protein-protein interaction networks. PloS one 9(3), e91856 (2014)

    Article  Google Scholar 

  22. Wu, M., Li, X., Kwoh, C.K., Ng, S.K.: A core-attachment based method to detect protein complexes in PPI networks. BMC Bioinform. 10(1), 1 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nazrul Hoque .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hoque, N., Ahmed, H.A., Bhattacharyya, D.K. (2020). Empirical Analysis of Proximity Measures in Machine Learning. In: Das, A., Nayak, J., Naik, B., Pati, S., Pelusi, D. (eds) Computational Intelligence in Pattern Recognition. Advances in Intelligent Systems and Computing, vol 999. Springer, Singapore. https://doi.org/10.1007/978-981-13-9042-5_34

Download citation

Publish with us

Policies and ethics