Skip to main content

Semi-supervised Probabilistic Distance Clustering and the Uncertainty of Classification

  • Conference paper
  • First Online:
Book cover Advances in Data Analysis, Data Handling and Business Intelligence

Abstract

Semi-supervised clustering is an attempt to reconcile clustering (unsupervised learning) and classification (supervised learning, using prior information on the data). These two modes of data analysis are combined in a parameterized model, the parameter θ ∈ [0, 1] is the weight attributed to the prior information, θ = 0 corresponding to clustering, and θ = 1 to classification. The results (cluster centers, classification rule) depend on the parameter θ, an insensitivity to θ indicates that the prior information is in agreement with the intrinsic cluster structure, and is otherwise redundant. This explains why some data sets (such as the Wisconsin breast cancer data, Merz and Murphy, UCI repository of machine learning databases, University of California, Irvine, CA) give good results for all reasonable classification methods. The uncertainty of classification is represented here by the geometric mean of the membership probabilities, shown to be an entropic distance related to the Kullback–Leibler divergence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Aczél, J. (1984). Measuring information beyond communication theory – Why some generalized information measures may be useful, others not. Aequationes Mathematicae, 27, 1–19.

    Article  MATH  MathSciNet  Google Scholar 

  • Arav, M. (2008). Contour approximation of data and the harmonic mean. Journal of Mathematical Inequalities, 2, 161–167.

    MathSciNet  Google Scholar 

  • Bar-Hillel, A., Hertz, T., Shental, N., & Weinshall, D. (2005). Learning a mahalanobis metric from equivalence constraints. Journal of Machine Learning Research, 6, 937–965.

    MathSciNet  Google Scholar 

  • Ben-Israel, A., & Iyigun, C. (2008). Probabilistic distance clustering. Journal of Classification, 25, 5–26.

    Article  MATH  MathSciNet  Google Scholar 

  • Ben-Tal, A., Ben-Israel, A., & Teboulle, M. (1991). Certainty equivalents and information measures: Duality and extremal principles. Journal of Mathematical Analysis and Applications, 157, 211–236.

    Article  MATH  MathSciNet  Google Scholar 

  • Ben-Tal, A., & Teboulle, M. (1987). Penalty functions and duality in stochastic programming via ϕ-divergence functionals. Mathematics of Operations Research, 12, 224–240.

    Article  MATH  MathSciNet  Google Scholar 

  • Bezdek, J. C. (1981). Pattern recognition with fuzzy objective function algorithms. New York: Plenum.

    MATH  Google Scholar 

  • Chapelle, O., Schölkopf, B., & Zien, A. (Eds.) (2006). Semi-supervised learning. Cambridge MA: MIT Press.

    Google Scholar 

  • Csiszár, I. (1978). Information measures: A critical survey. In Trans. 7th Prague Conf. on Info. Th., Statist., Decis. Funct., Random Processes and 8th European Meeting of Statist. (Vol. B, pp. 73–86). Prague: Academia.

    Google Scholar 

  • Dixon, K. R., & Chapman, J. A. (1980). Harmonic mean measure of animal activity areas. Ecology, 61, 1040–1044.

    Article  Google Scholar 

  • Grira, N., Crucianu, M., & Boujemaa, N. (2005). Unsupervised and semi-supervised clustering: A brief survey. In A Review of Machine Learning Techniques for Processing Multimedia Content. Report of the MUSCLE European Network of Excellence.

    Google Scholar 

  • Höppner, F., Klawonn, F., Kruse, R., & Runkler, T. (1999). Fuzzy cluster analysis. New York: Wiley.

    MATH  Google Scholar 

  • Iyigun, C., & Ben-Israel, A. (2008). Probabilistic distance clustering adjusted for cluster size. Probability in the Engineering and Informational Sciences, 22, 1–19.

    Article  MathSciNet  Google Scholar 

  • Iyigun, C., & Ben-Israel, A. (2009). Contour approximation of data: The dual problem. Linear Algebra and Its Applications, 430, 2771–2780.

    Article  MATH  MathSciNet  Google Scholar 

  • Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: A review. ACM Computing Surveys, 31, 264–323.

    Article  Google Scholar 

  • Kuhn, H. W. (1967). On a pair of dual nonlinear programs. In J. Abadie (Ed.), Methods of nonlinear programming (pp. 38–54). Amsterdam: North-Holland.

    Google Scholar 

  • Kuhn, H. W. (1973). A note on Fermat’s problem. Mathematical Programming, 4, 98–107.

    Article  MATH  MathSciNet  Google Scholar 

  • Kullback, S. (1959). Information theory and statistics. New York: Wiley.

    MATH  Google Scholar 

  • Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22, 79–86.

    Article  MATH  MathSciNet  Google Scholar 

  • Lim, T.-S., Loh, W.-Y., & Shih, Y.-S. (2000). A comparison of prediction accuracy, complexity, and training time of thirty three old and new classification algorithms. Machine Learning, 40, 203–228.

    Article  MATH  Google Scholar 

  • Luce, R. D. (1959). Individual choice behavior. New York: Wiley.

    MATH  Google Scholar 

  • Mangasarian, O. L., Setiono, R., & Wolberg, W. H. (1999). Pattern recognition via linear programming: theory and application to medical diagnosis. In T. Coleman, & Y. Li (Eds.) Large-scale numerical optimization (pp. 22–30). Philadelphia: SIAM Publications.

    Google Scholar 

  • Merz, C., & Murphy, P. (1996). UCI repository of machine learning databases. Irvine, CA: Department of Information and Computer Science, University of California. Retrieved from http://www.ics.uci.edu/mlearn/MLRepository.html.

  • Teboulle, M. (2007). A unified continuous optimization framework for center-based clustering methods. Journal of Machine Learning Research, 8, 65–102.

    MathSciNet  Google Scholar 

  • Weiszfeld, E. (1937). Sur le point par lequel la somme des distances de n points donnés est minimum. Tohoku Mathematical Journal, 43, 355–386.

    Google Scholar 

  • Wolberg, W. H., & Mangasarian, O. L. (1990). Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences of USA 87, 9193–9196.

    Article  MATH  Google Scholar 

  • Xing, E. P., Ng, A. Y., Jordan, M. I., & Russell, S. (2003). Distance metrice learning with application to clustering with side-information. In Advances in neural information processing systems (Vol. 15). Cambridge MA: MIT Press.

    Google Scholar 

  • Yellott, J. I. Jr. (2001). Luce’s Choice Axiom. In N. J. Smelser, & P. B. Baltes (Eds.), International Encyclopedia of the Social and Behavioral Sciences (pp. 9094–9097). Oxford: Elsevier. ISBN 0-08-043076-7.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adi Ben-Israel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Iyigun, C., Ben-Israel, A. (2009). Semi-supervised Probabilistic Distance Clustering and the Uncertainty of Classification. In: Fink, A., Lausen, B., Seidel, W., Ultsch, A. (eds) Advances in Data Analysis, Data Handling and Business Intelligence. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01044-6_1

Download citation

Publish with us

Policies and ethics