Skip to main content

Semi-supervised Clustering Framework Based on Active Learning for Real Data

  • Conference paper
  • First Online:
  • 1183 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11004))

Abstract

In this paper, we propose a real data clustering method based on active learning. Clustering methods are difficult to apply to real data for two reasons. First, real data may include outliers that adversely affect clustering. Second, the clustering parameters such as the number of clusters cannot be made constant because the number of classes of real data may increase as time goes by. To solve the first problem, we focus on labeling outliers. Therefore, we develop a stream-based active learning framework for clustering. The active learning framework enables us to label the outliers intensively. To solve the second problem, we also develop an algorithm to automatically set clustering parameters. This algorithm can automatically set the clustering parameters with some labeled samples. The experimental results show that our method can deal with the problems mentioned above better than the conventional clustering methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Halim, Z., Atif, M., Rashid, A.: Profiling players using real-world datasets: clustering the data and correlating the results with the big-five personality traits. IEEE Trans. Affect. Comput., 1–18 (2017)

    Google Scholar 

  2. Bijuraj, L.V.: Clustering and its applications. In: Proceedings of National Conference on New Horizons in IT - NCNHIT 2013, pp. 169–172 (2013)

    Google Scholar 

  3. Tran, N., Vo, B., Phung, D.: Clustering for point pattern data. In: Proceedings of the 2016 23rd International Conference on Pattern Recognition (2013)

    Google Scholar 

  4. Kamishima, T., Motoyoshi, F.: Learning from cluster examples. Mach. Learn. 53(3), 199–233 (2003)

    Article  Google Scholar 

  5. Bair, E.: Semi-supervised clustering methods. Wiley Interdisc. Rev. Comput. Stat. 5(5), 349–361 (2013)

    Article  Google Scholar 

  6. Grira, N., Crucianu, M., Boujemaa, N.: Unsupervised and semi-supervised clustering: a brief survey. In: Proceedings of the Review of Machine Learning Techniques for Processing MUSCLE European Network of Excellence (2004)

    Google Scholar 

  7. Wang, Y., Chen, S., Zhou, Z.: New semi-supervised classification method based on modified cluster assumption. IEEE Trans. Neural Netw. Learn. Syst. 23(5), 689–702 (2012)

    Article  Google Scholar 

  8. Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained k-means clustering with background knowledge. In: Proceedings of the 9th ICML, pp. 577–584 (2001)

    Google Scholar 

  9. Kohonen, T.: Self-Organizing Maps, vol. 30. Springer, Heidelberg (2001). https://doi.org/10.1007/978-3-642-56927-2

    Book  MATH  Google Scholar 

  10. Macqueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)

    Google Scholar 

  11. Martinez-Uso, A., Pla, F., Sotoca, J.: A semi-supervised Gaussian mixture model for image segmentation. In: Proceedings of 20th International Conference on Pattern Recognition, pp. 2941–2944 (2010)

    Google Scholar 

  12. Grira, N., Crucianu, M., Boujemaa, N.: Active semi-supervised fuzzy clustering. Pattern Recogn. 41(5), 1834–1844 (2008)

    Article  Google Scholar 

  13. Gosselin, P.H., Cord, M.: Active learning methods for interactive image retrieval. IEEE Trans. Image Process. 17(7), 1200–1211 (2008)

    Article  MathSciNet  Google Scholar 

  14. Ward Jr., J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58(301), 236–244 (1963)

    Article  MathSciNet  Google Scholar 

  15. Narr, A., Triebel, R., Cremers, D.: Stream-based active learning for efficient and adaptive classification of 3D objects. In: Proceedings of 2016 IEEE International Conference on Robotics and Automation (2016)

    Google Scholar 

  16. Fujii, K., Kashima, H.: Budgeted stream-based active learning via adaptive submodular maximization. In: Proceedings of Conference and Workshop on Neural Information Processing Systems (2016)

    Google Scholar 

  17. Dua, D., Karra Taniskidou, E.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml&gt

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ryosuke Odate .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Odate, R., Shinjo, H., Suzuki, Y., Motobayashi, M. (2018). Semi-supervised Clustering Framework Based on Active Learning for Real Data. In: Bai, X., Hancock, E., Ho, T., Wilson, R., Biggio, B., Robles-Kelly, A. (eds) Structural, Syntactic, and Statistical Pattern Recognition. S+SSPR 2018. Lecture Notes in Computer Science(), vol 11004. Springer, Cham. https://doi.org/10.1007/978-3-319-97785-0_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-97785-0_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-97784-3

  • Online ISBN: 978-3-319-97785-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics