Abstract
This work deepens in a methodology to generate Instance Level Constraints for Semi-supervised clustering by the study of the inherent nature of the data. The methodology executes a partitional clustering algorithm repetitively, so we study its behaviour according to the number of iterations of the clustering. In this scenario we propose three different stopping criteria to determine how many times the partitional clustering algorithm should be executed to obtain reliable instance level constraints. These criteria are experimentally tested under the document clustering problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Barr, J., Cament, L., Bowyer, K., Flynn, P.: Active clustering with ensembles for social structure extraction. In: 2014 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 969–976, March 2014
Basu, S., Banerjee, A., Mooney, R.J.: Semi-supervised clustering by seeding. In: Proceedings of the Nineteenth International Conference on Machine Learning, ICML 2002, pp. 27–34. Morgan Kaufmann Publishers Inc., San Francisco (2002)
Basu, S., Bilenko, M., Mooney, R.J.: A probabilistic framework for semi-supervised clustering. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2004, pp. 59–68. ACM, New York (2004)
Basu, S., Davidson, I., Wagstaff, K.: Constrained Clustering: Advances in Algorithms, Theory, and Applications, 1st edn. Chapman & Hall/CRC (2008)
Diaz-Valenzuela, I., Loia, V., Martin-Bautista, M., Senatore, S., Vila, M.: Automatic constraints generation for semisupervised clustering: experiences with documents classification. Soft Computing, 1–11 (2015). doi:10.1007/s00500-015-1643-3
Diaz-Valenzuela, I., Martin-Bautista, M.J., Vila, M.A.: Using a semisupervised fuzzy clustering process for identity identification in digital libraries. In: 2013 Joint IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS), pp. 831–836 (2013)
Diaz-Valenzuela, I., Martin-Bautista, M.J., Vila, M.A., Campaña, J.R.: An automatic system for identifying authorities in digital libraries. Expert Systems with Applications 40(10), 3994–4002 (2013). http://www.sciencedirect.com/science/article/pii/S0957417413000134
Diaz-Valenzuela, I., Martin-Bautista, M.J., Vila, M.-A.: A fuzzy semisupervised clustering method: application to the classification of scientific publications. In: Laurent, A., Strauss, O., Bouchon-Meunier, B., Yager, R.R. (eds.) IPMU 2014, Part I. CCIS, vol. 442, pp. 179–188. Springer, Heidelberg (2014)
Grira, N., Crucianu, M., Boujemaa, N.: Unsupervised and semi-supervised clustering: a brief survey. In: A Review of Machine Learning Techniques for Processing Multimedia Content, Report of the MUSCLE European Network of Excellence FP6 (2004)
Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice-Hall Inc., Upper Saddle River (1988)
Li, X., Wang, L., Song, Y., Zhao, X.: A hybrid constrained semi-supervised clustering algorithm. In: 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), vol. 4, pp. 1597–1601, August 2010
Loia, V., Pedrycz, W., Senatore, S.: P-FCM: a proximity-based fuzzy clustering for user-centered web applications. Int. J. Approx. Reasoning 34(2–3), 121–144 (2003). doi:10.1016/j.ijar.2003.07.004
Ltd., R., Carnegie Group, I.: Reuters-21578 dataset. http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html
Pedrycz, W., Loia, V., Senatore, S.: Fuzzy clustering with viewpoints. IEEE Transactions on Fuzzy Systems 18(2), 274–284 (2010)
Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of the 17th International Conference on World Wide Web, WWW 2008, pp. 91–100. ACM, New York (2008)
Tang, W., Xiong, H., Zhong, S., Wu, J.: Enhancing semi-supervised clustering: A feature projection perspective. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2007, pp. 707–716. ACM, New York (2007)
Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 1103–1110 (2000)
Wagstaff, K., Cardie, C., Rogers, S., Schrdl, S.: Constrained k-means clustering with background knowledge. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001, pp. 577–584. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning, with application to clustering with side-information. In: Advances in Neural Information Processing Systems 15, vol. 15, pp. 505–512 (2002)
Xiong, S., Azimi, J., Fern, X.: Active learning of constraints for semi-supervised clustering. IEEE Transactions on Knowledge and Data Engineering 26(1), 43–54 (2014)
Zhao, W., He, Q., Ma, H., Shi, Z.: Effective semi-supervised document clustering via active learning with instance-level constraints. Knowledge and Information Systems 30(3), 569–587 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Diaz-Valenzuela, I., Campaña, J.R., Senatore, S., Loia, V., Vila, M.A., Martin-Bautista, M.J. (2016). Study of the Convergence in Automatic Generation of Instance Level Constraints. In: Andreasen, T., et al. Flexible Query Answering Systems 2015. Advances in Intelligent Systems and Computing, vol 400. Springer, Cham. https://doi.org/10.1007/978-3-319-26154-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-26154-6_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26153-9
Online ISBN: 978-3-319-26154-6
eBook Packages: Computer ScienceComputer Science (R0)