International Journal of Computer Vision

, Volume 126, Issue 1, pp 59–85 | Cite as

Efficient Label Collection for Image Datasets via Hierarchical Clustering

  • Maggie Wigness
  • Bruce A. Draper
  • J. Ross Beveridge
Article
  • 348 Downloads

Abstract

Raw visual data used to train classifiers is abundant and easy to gather, but lacks semantic labels that describe visual concepts of interest. These labels are necessary for supervised learning and can require significant human effort to collect. We discuss four labeling objectives that play an important role in the design of frameworks aimed at collecting label information for large training sets while maintaining low human effort: discovery, efficiency, exploitation and accuracy. We introduce a framework that explicitly models and balances these four labeling objectives with the use of (1) hierarchical clustering, (2) a novel interestingness measure that defines structural change within the hierarchy, and (3) an iterative group-based labeling process that exploits relationships between labeled and unlabeled data. Results on benchmark data show that our framework collects labeled training data more efficiently than existing labeling techniques and trains higher performing visual classifiers. Further, we show that our resulting framework is fast and significantly reduces human interaction time when labeling real-world multi-concept imagery depicting outdoor environments.

Keywords

Efficient label collection Hierarchical clustering Image classification Visual concept discovery 

References

  1. Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & Susstrunk, S. (2012). Slic superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2274–2282.CrossRefGoogle Scholar
  2. Biswas, A., & Jacobs, D. (2012). Active image clustering: Seeking constraints from humans to complement algorithms. In Proceedings of computer vision and pattern recognition (pp. 2152—2159). IEEE.Google Scholar
  3. Chaaraoui, A. A., Climent-Pérez, P., & Flórez-Revuelta, F. (2012). A review on vision techniques applied to human behaviour analysis for ambient-assisted living. Expert Systems with Applications, 39(12), 10873–10888.CrossRefGoogle Scholar
  4. Chang, J. C., Kittur, A., & Hahn, N. (2016). Alloy: Clustering with crowds and computation. In Proceedings of the CHI conference on human factors in computing systems (pp. 3180–3191). ACM.Google Scholar
  5. Chatterjee, A., Rakshit, A., & Singh, N. N. (2012). Vision based autonomous robot navigation: Algorithms and implementations (Vol. 455). Berlin: Springer.MATHGoogle Scholar
  6. Chen, J., Cui, Y., Ye, G., Liu, D., & Chang, S. F. (2014). Event-driven semantic concept discovery by exploiting weakly tagged internet images. In Proceedings of international conference on multimedia retrieval (p. 1). ACM.Google Scholar
  7. Chilton, L. B., Little, G., Edge, D., Weld, D. S., & Landay, J. A. (2013). Cascade: Crowdsourcing taxonomy creation. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 1999–2008). ACM.Google Scholar
  8. Dai, D., Prasad, M., Leistner, C., & Van Gool, L. (2012). Ensemble partitioning for unsupervised image categorization. In Proceedings of European conference on computer vision (pp. 483–496). Springer.Google Scholar
  9. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of the conference on computer vision and pattern recognition (Vol. 1, pp. 886–893). IEEE.Google Scholar
  10. Deng, J., Dong, W., Socher, R., Li, L. J., Li K., & Fei-Fei L. (2009). Imagenet: A large-scale hierarchical image database. In Proceedings of computer vision and pattern recognition. IEEE.Google Scholar
  11. Deng, J., Russakovsky, O., Krause, J., Bernstein, M. S., Berg, A., & Fei-Fei, L. (2014). Scalable multi-label annotation. In Proceedings of human factors in computing systems (pp. 3099–3102). ACM.Google Scholar
  12. Fei-Fei, L., & Perona, P. (2005). A bayesian hierarchical model for learning natural scene categories. In Proceedings of computer vision and pattern recognition (Vol. 2, pp. 524–531). IEEE.Google Scholar
  13. Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181.CrossRefGoogle Scholar
  14. Frénay, B., & Verleysen, M. (2014). Classification in the presence of label noise: a survey. IEEE Transactions on Neural Networks and Learning Systems, 25(5), 845–869.CrossRefGoogle Scholar
  15. Galleguillos, C., McFee, B., & Lanckriet, G. (2014). Iterative category discovery via multiple kernel metric learning. International Journal of Computer Vision, 108(1–2), 115–132. doi: 10.1007/s11263-013-0679-z.MathSciNetCrossRefMATHGoogle Scholar
  16. Gilbert, A., & Bowden, R. (2011). igroup: Weakly supervised image and video grouping. In Proceedings of international conference on computer vision (pp. 2166–2173).Google Scholar
  17. Griffin, G., Holub, A., & Perona, P. (2007). Caltech-256 object category dataset. Technical report, California Institute of Technology.Google Scholar
  18. Holub, A., Perona, P., & Burl, M. C. (2008). Entropy-based active learning for object recognition. In Proceedings of computer vision and pattern recognition workshops (pp. 1–8). IEEE.Google Scholar
  19. Jain, P., & Kapoor, A. (2009). Active learning for large multi-class problems. In Proceedings of computer vision and pattern recognition (pp. 762–769). IEEE.Google Scholar
  20. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., et al. (2014). Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093
  21. Joshi, A. J., Porikli, F., & Papanikolopoulos, N. (2009). Multi-class active learning for image classification. In Proceedings of computer vision and pattern recognition (pp. 2372–2379).Google Scholar
  22. Kapoor, A., Grauman, K., Urtasun, R., & Darrell, T. (2007). Active learning with gaussian processes for object categorization. In Proceedings of international conference on computer vision (pp. 1–8). IEEE.Google Scholar
  23. Krishna, R., Hata, K., Chen, S., Kravitz, J., Shamma, D. A., Fei-Fei, L., et al. (2016). Embracing error to enable rapid crowdsourcing. In Proceedings of the CHI conference on human factors in computing systems. ACM.Google Scholar
  24. Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images. Technical report, University of Toronto.Google Scholar
  25. Krizhevsky, A., Sutskever, I., & Hinton G. E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).Google Scholar
  26. Lee, Y. J., & Grauman, K. (2011). Learning the easy things first: Self-paced visual category discovery. In Proceedings of computer vision and pattern recognition (pp. 1721–1728). IEEE.Google Scholar
  27. Lee, Y. J., & Grauman, K. (2012). Object-graphs for context-aware visual category discovery. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(2), 346–358.CrossRefGoogle Scholar
  28. Lennon, C., Bodt, B., Childers, M., Camden, R., Suppé, A., Navarro-Serment, L., et al. (2013). Performance evaluation of a semantic perception classifier. Technical report ARL-TR-6653, Army Research Labs.Google Scholar
  29. Li, X., & Guo, Y. (2013). Adaptive active learning for image classification. In Proceedings of computer vision and pattern recognition. IEEE.Google Scholar
  30. Liu, D., & Chen, T. (2007). Unsupervised image categorization and object localization using topic models and correspondences between images. In Proceedings of international conference on computer vision (pp. 1–7). IEEE.Google Scholar
  31. Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.CrossRefGoogle Scholar
  32. Munoz, D. (2013). Inference machines: Parsing scenes via iterated predictions. PhD thesis, The Robotics Institute, Carnegie Mellon University.Google Scholar
  33. Nettleton, D., Orriols-Puig, A., & Fornells, A. (2010). A study of the effect of different types of noise on the precision of supervised learning techniques. Artificial Intelligence Review, 33(4), 275–306.CrossRefGoogle Scholar
  34. Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., & Ng, A. Y. (2011). Reading digits in natural images with unsupervised feature learning. In NIPS workshop on deep learning and unsupervised feature learning (Vol. 2, p. 5).Google Scholar
  35. Ojala, T., Pietikäinen, M., & Harwood, D. (1996). A comparative study of texture measures with classification based on featured distributions. Pattern Recognition, 29(1), 51–59.CrossRefGoogle Scholar
  36. Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.CrossRefMATHGoogle Scholar
  37. Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2008). Labelme: A database and web-based tool for image annotation. International Journal of Computer Vision, 77(1–3), 157–173.Google Scholar
  38. Saenko, K., Kulis, B., Fritz, M., & Darrell, T. (2010). Adapting visual category models to new domains. In Proceedings of the European conference on computer vision (pp. 213–226). Springer.Google Scholar
  39. Settles, B. (2010). Active learning literature survey. Madison: University of Wisconsin.MATHGoogle Scholar
  40. Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2006). Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In Proceedings of European conference on computer vision (pp. 1–15). Springer.Google Scholar
  41. Sivic, J., Russell, B., Efros, A., Zisserman, A., & Freeman, W. (2005). Discovering objects and their location in images. In Proceedings of international conference on computer vision (pp. 370–377).Google Scholar
  42. Sorokin, A., & Forsyth, D. (2008). Utility data annotation with amazon mechanical turk. In Computer vision and pattern recognition workshops Google Scholar
  43. Sun, C., Gan, C., & Nevatia, R. (2015). Automatic concept discovery from parallel text and visual corpora. In Proceedings of the IEEE international conference on computer vision (pp. 2596–2604).Google Scholar
  44. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. In Proceedings of computer vision and pattern recognition. IEEE.Google Scholar
  45. Tamuz, O., Liu, C., Belongie, S., Shamir, O., & Kalai, A. T. (2011). Adaptively learning the crowd kernel. In Proceedings of the international conference on machine learning. IEEE.Google Scholar
  46. Tuytelaars, T., Lampert, C. H., Blaschko, M. B., & Buntine, W. (2010). Unsupervised object discovery: A comparison. International Journal of Computer Vision, 88(2), 284–302.CrossRefGoogle Scholar
  47. Vijayanarasimhan, S., & Grauman, K. (2014). Large-scale live active learning: Training object detectors with crawled data and crowds. International Journal of Computer Vision, 108(1–2), 97–114.MathSciNetCrossRefGoogle Scholar
  48. Vijayanarasimhan, S., Jain, P., & Grauman, K. (2010). Far-sighted active learning on a budget for image and video recognition. In Proceedings of the conference on computer vision and pattern recognition (pp. 3035–3042). IEEE.Google Scholar
  49. Ward, J. H, Jr. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236–244.MathSciNetCrossRefGoogle Scholar
  50. Wigness, M., Draper, B. A., Beveridge, J. R. (2014). Selectively guiding visual concept discovery. In Proceedings of the winter conference on applications of computer vision. IEEE.Google Scholar
  51. Wigness, M., Draper, B. A., & Beveridge, J. R. (2015). Efficient label collection for unlabeled image datasets. In Proceedings of computer vision and pattern recognition. IEEE.Google Scholar
  52. Wigness, M., Rogers III J. G., Navarro-Serment, L. E., Suppe, A., & Draper, B. A. (2016). Reducing adaptation latency for multi-concept visual perception in outdoor environments. In Proceedings of international conference on intelligent robots and systems. IEEE.Google Scholar
  53. Xiong, C., Johnson, D. M., & Corso, J. J. (2012). Spectral active clustering via purification of the \(k\)-nearest neighbor graph. In Proceedings of European conference on data mining.Google Scholar
  54. Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. In: Advances in neural information processing systems (pp. 487–495).Google Scholar

Copyright information

© Springer Science+Business Media, LLC (outside the USA) 2017

Authors and Affiliations

  1. 1.U.S. Army Research LaboratoryAdelphiUSA
  2. 2.Colorado State UniversityFort CollinsUSA

Personalised recommendations