Skip to main content

Using Clustering to Learn Distance Functions for Supervised Similarity Assessment

  • Conference paper
Machine Learning and Data Mining in Pattern Recognition (MLDM 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3587))

Abstract

Assessing the similarity between objects is a prerequisite for many data mining techniques. This paper introduces a novel approach to learn distance functions that maximizes the clustering of objects belonging to the same class. Objects belonging to a data set are clustered with respect to a given distance function and the local class density information of each cluster is then used by a weight adjustment heuristic to modify the distance function so that the class density is increased in the attribute space. This process of interleaving clustering with distance function modification is repeated until a “good” distance function has been found. We implemented our approach using the k-means clustering algorithm. We evaluated our approach using 7 UCI data sets for a traditional 1-nearest-neighbor (1-NN) classifier and a compressed 1-NN classifier, called NCC, that uses the learnt distance function and cluster centroids instead of all the points of a training set. The experimental results show that attribute weighting leads to statistically significant improvements in prediction accuracy over a traditional 1-NN classifier for 2 of the 7 data sets tested, whereas using NCC significantly improves the accuracy of the 1-NN classifier for 4 of the 7 data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning Distance Functions Using Equivalence Relations. In: Proc. ICML 2003, Washington D.C. (2003)

    Google Scholar 

  2. Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases, Irvine, CA. University of California, Department of Information and Computer Science (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html

  3. Eick, C., Zeidat, N.: Using Supervised Clustering to Enhance Classifiers. In: Hacid, M.-S., Murray, N.V., Raś, Z.W., Tsumoto, S. (eds.) ISMIS 2005. LNCS (LNAI), vol. 3488, pp. 248–256. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  4. Han, E.H., Karypis, G., Kumar, V.: Text Categorization Using Weight Adjusted nearest-neighbor Classification. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS (LNAI), vol. 2035, p. 53. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  5. Hastie, T., Tibshirani, R.: Disriminant Adaptive Nearest-Neighbor Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 18, 607–616 (1996)

    Article  Google Scholar 

  6. Klein, D., Kamvar, S.-D., Manning, C.: From instance-level Constraints to Space-level Constraints: Making the Most of Prior Knowledge in Data Clustering. In: Proc. ICML 2002, Sydney, Australia (2002)

    Google Scholar 

  7. Kira, K., Rendell, L.: A practical Approach to Feature Selection. In: Proc. 9th Int. Conf. on Machine Learning (1992)

    Google Scholar 

  8. MacQueen, J.: Some methods for classification and analysis of multi-variate observations. In: Proc. 5th Berkeley Symposium Math., Stat., Prob., vol. 1, pp. 281–297 (1967)

    Google Scholar 

  9. Salzberg, S.: A nearest Hyperrectangle Learning Method, Machine Learning (1991)

    Google Scholar 

  10. Stein, B., Niggemann, O.: Generation of Similarity Measures from Different Sources. In: Monostori, L., Váncza, J., Ali, M. (eds.) IEA/AIE 2001. LNCS (LNAI), vol. 2070, p. 197. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  11. Witten, I., Eibe, F.: Data Mining: Practical machine learning tools with Java implementations. In: Witten, I.H., Frank, E. (eds.). Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  12. Xing, E.P., Ng, A., Jordan, M., Russell, S.: Distance Metric Learning with Applications to Clustering with Side Information. In: Advances in Neural Information Processing 15. MIT Press, Cambridge (2003)

    Google Scholar 

  13. Zhang, Z.: Learning Metrics via Discriminant Kernels and Multi-Dimensional Scaling: Toward Expected Euclidean Representation. In: Proc. ICML 2003, Washington D.C. (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Eick, C.F., Rouhana, A., Bagherjeiran, A., Vilalta, R. (2005). Using Clustering to Learn Distance Functions for Supervised Similarity Assessment. In: Perner, P., Imiya, A. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2005. Lecture Notes in Computer Science(), vol 3587. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11510888_13

Download citation

  • DOI: https://doi.org/10.1007/11510888_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26923-6

  • Online ISBN: 978-3-540-31891-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics