Skip to main content

Supervised Pre-processings Are Useful for Supervised Clustering

  • Conference paper
  • First Online:
  • 2217 Accesses

Abstract

Over the last years, researchers have focused their attention on a new approach, supervised clustering, that combines the main characteristics of both traditional clustering and supervised classification tasks. Motivated by the importance of pre-processing approaches in the traditional clustering context, this paper explores to what extent supervised pre-processing steps could help traditional clustering to obtain better performance on supervised clustering tasks. This paper reports experiments which show that indeed standard clustering algorithms are competitive compared to existing supervised clustering algorithms when supervised pre-processing steps are carried out.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Aguilar-Ruiz, J. S., Ruiz, R., Santos, J. C. R., & Girldez, R. (2001). SNN: A supervised clustering algorithm. In L. Monostori, J. Vncza, & M. Ali (Eds.), IEA/AIE. Lecture Notes in Computer Science (Vol. 2070, pp. 207–216). Heidelberg: Springer.

    Google Scholar 

  • al-Harbi, S. H., & Rayward-Smith, V. J. (2006). Adapting k-means for supervised clustering. Journal of Applied Intelligence, 24(3), 219–226.

    Google Scholar 

  • Arthur, D., & Vassilvitskii, S. (2007). K-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms. SODA ’07 (pp. 1027–1035).

    Google Scholar 

  • Berry, M., & Linoff, G. (1997). Data mining techniques for marketing, sales, and customer support. New York: Wiley.

    Google Scholar 

  • Berson, A., Smith, S., & Thearling, K. (2000). Building data mining applications for CRM. New York: McGraw-Hill.

    MATH  Google Scholar 

  • Boullé, M. (2005). A Bayes optimal approach for partitioning the values of categorical attributes. Journal of Machine Learning Research, 6, 1431–1452.

    MathSciNet  MATH  Google Scholar 

  • Boullé, M. (2006). MODL: A Bayes optimal discretization method for continuous attributes. Journal of Machine Learning, 65(1), 131–165.

    Article  Google Scholar 

  • Bungkomkhun, P. (2012). Grid-based supervised clustering algorithm using greedy and gradient descent methods to build clusters. In National Institute of Development Administration. http://libdcms.nida.ac.th/thesis6/2012/b175320.pdf.

  • Celebi, E. M., Kingravi, H. A., & Vela, P. A. (2013). A comparative study of efficient initialization methods for the k-means clustering algorithm. Journal of Expert Systems with Applications, 40(1), 200–210.

    Article  Google Scholar 

  • Eick, C. F., Zeidat, N., & Zhao, Z. (2004). Supervised clustering algorithms and benefits. In 16th IEEE International Conference on Tools with Artificial Intelligence, 2004. ICTAI 2004, Boca Raton (pp. 774–776).

    Google Scholar 

  • Finley, T., & Joachims, T. (2005). Supervised clustering with support vector machines. In Proceedings of the 22nd International Conference on Machine Learning. ICML ’05 (pp. 217–224). New York, NY: ACM.

    Google Scholar 

  • Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.

    Google Scholar 

  • Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: A review. ACM Computing Surveys, 31(3), 264–323.

    Article  Google Scholar 

  • Lemaire, V., Clérot, F., & Creff, N. (2012). K-means clustering on a classifier-induced representation space: Application to customer contact personalization. Real-world data mining applications. Annals of Information Systems (pp. 139–153). Cham: Springer.

    Google Scholar 

  • Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml/]. Irvine, CA: University of California, School of Information and Computer Science.

  • Macqueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In L. M. L. Cam & J. Neyman (Eds.), Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (pp. 281–297). Berkeley, CA: University of California Press.

    Google Scholar 

  • Milligan, G., & Cooper, M. (1988). A study of standardization of variables in cluster analysis. Journal of Classification, 5(2), 181–204.

    Article  MathSciNet  Google Scholar 

  • Qu, Y., & Xu, S. (2004). Supervised cluster analysis for microarray data based on multivariate gaussian mixture. Journal of Bioinformatics, 20(12), 1905–1913.

    Article  Google Scholar 

  • Sinkkonen, J., Kaski, S., & Nikkil, J. (2002). Discriminative clustering: Optimal contingency tables by learning metrics. Machine learning: ECML 2002 (Vol. 2430, pp. 418–430). Heidelberg: Springer.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Ismaili, O.A., Lemaire, V., Cornuéjols, A. (2016). Supervised Pre-processings Are Useful for Supervised Clustering. In: Wilhelm, A., Kestler, H. (eds) Analysis of Large and Complex Data. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-25226-1_13

Download citation

Publish with us

Policies and ethics