Skip to main content

Study of the Convergence in Automatic Generation of Instance Level Constraints

  • Conference paper
  • First Online:
Flexible Query Answering Systems 2015

Abstract

This work deepens in a methodology to generate Instance Level Constraints for Semi-supervised clustering by the study of the inherent nature of the data. The methodology executes a partitional clustering algorithm repetitively, so we study its behaviour according to the number of iterations of the clustering. In this scenario we propose three different stopping criteria to determine how many times the partitional clustering algorithm should be executed to obtain reliable instance level constraints. These criteria are experimentally tested under the document clustering problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barr, J., Cament, L., Bowyer, K., Flynn, P.: Active clustering with ensembles for social structure extraction. In: 2014 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 969–976, March 2014

    Google Scholar 

  2. Basu, S., Banerjee, A., Mooney, R.J.: Semi-supervised clustering by seeding. In: Proceedings of the Nineteenth International Conference on Machine Learning, ICML 2002, pp. 27–34. Morgan Kaufmann Publishers Inc., San Francisco (2002)

    Google Scholar 

  3. Basu, S., Bilenko, M., Mooney, R.J.: A probabilistic framework for semi-supervised clustering. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2004, pp. 59–68. ACM, New York (2004)

    Google Scholar 

  4. Basu, S., Davidson, I., Wagstaff, K.: Constrained Clustering: Advances in Algorithms, Theory, and Applications, 1st edn. Chapman & Hall/CRC (2008)

    Google Scholar 

  5. Diaz-Valenzuela, I., Loia, V., Martin-Bautista, M., Senatore, S., Vila, M.: Automatic constraints generation for semisupervised clustering: experiences with documents classification. Soft Computing, 1–11 (2015). doi:10.1007/s00500-015-1643-3

    Google Scholar 

  6. Diaz-Valenzuela, I., Martin-Bautista, M.J., Vila, M.A.: Using a semisupervised fuzzy clustering process for identity identification in digital libraries. In: 2013 Joint IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS), pp. 831–836 (2013)

    Google Scholar 

  7. Diaz-Valenzuela, I., Martin-Bautista, M.J., Vila, M.A., Campaña, J.R.: An automatic system for identifying authorities in digital libraries. Expert Systems with Applications 40(10), 3994–4002 (2013). http://www.sciencedirect.com/science/article/pii/S0957417413000134

    Article  Google Scholar 

  8. Diaz-Valenzuela, I., Martin-Bautista, M.J., Vila, M.-A.: A fuzzy semisupervised clustering method: application to the classification of scientific publications. In: Laurent, A., Strauss, O., Bouchon-Meunier, B., Yager, R.R. (eds.) IPMU 2014, Part I. CCIS, vol. 442, pp. 179–188. Springer, Heidelberg (2014)

    Google Scholar 

  9. Grira, N., Crucianu, M., Boujemaa, N.: Unsupervised and semi-supervised clustering: a brief survey. In: A Review of Machine Learning Techniques for Processing Multimedia Content, Report of the MUSCLE European Network of Excellence FP6 (2004)

    Google Scholar 

  10. Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice-Hall Inc., Upper Saddle River (1988)

    MATH  Google Scholar 

  11. Li, X., Wang, L., Song, Y., Zhao, X.: A hybrid constrained semi-supervised clustering algorithm. In: 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), vol. 4, pp. 1597–1601, August 2010

    Google Scholar 

  12. Loia, V., Pedrycz, W., Senatore, S.: P-FCM: a proximity-based fuzzy clustering for user-centered web applications. Int. J. Approx. Reasoning 34(2–3), 121–144 (2003). doi:10.1016/j.ijar.2003.07.004

    Article  MATH  Google Scholar 

  13. Ltd., R., Carnegie Group, I.: Reuters-21578 dataset. http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html

  14. Pedrycz, W., Loia, V., Senatore, S.: Fuzzy clustering with viewpoints. IEEE Transactions on Fuzzy Systems 18(2), 274–284 (2010)

    Google Scholar 

  15. Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of the 17th International Conference on World Wide Web, WWW 2008, pp. 91–100. ACM, New York (2008)

    Google Scholar 

  16. Tang, W., Xiong, H., Zhong, S., Wu, J.: Enhancing semi-supervised clustering: A feature projection perspective. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2007, pp. 707–716. ACM, New York (2007)

    Google Scholar 

  17. Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 1103–1110 (2000)

    Google Scholar 

  18. Wagstaff, K., Cardie, C., Rogers, S., Schrdl, S.: Constrained k-means clustering with background knowledge. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001, pp. 577–584. Morgan Kaufmann Publishers Inc., San Francisco (2001)

    Google Scholar 

  19. Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning, with application to clustering with side-information. In: Advances in Neural Information Processing Systems 15, vol. 15, pp. 505–512 (2002)

    Google Scholar 

  20. Xiong, S., Azimi, J., Fern, X.: Active learning of constraints for semi-supervised clustering. IEEE Transactions on Knowledge and Data Engineering 26(1), 43–54 (2014)

    Article  Google Scholar 

  21. Zhao, W., He, Q., Ma, H., Shi, Z.: Effective semi-supervised document clustering via active learning with instance-level constraints. Knowledge and Information Systems 30(3), 569–587 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jesús R. Campaña .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Diaz-Valenzuela, I., Campaña, J.R., Senatore, S., Loia, V., Vila, M.A., Martin-Bautista, M.J. (2016). Study of the Convergence in Automatic Generation of Instance Level Constraints. In: Andreasen, T., et al. Flexible Query Answering Systems 2015. Advances in Intelligent Systems and Computing, vol 400. Springer, Cham. https://doi.org/10.1007/978-3-319-26154-6_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26154-6_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26153-9

  • Online ISBN: 978-3-319-26154-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics