Skip to main content

From ImageNet to Mining: Adapting Visual Object Detection with Minimal Supervision

  • Chapter
  • First Online:

Part of the book series: Springer Tracts in Advanced Robotics ((STAR,volume 113))

Abstract

This paper presents visual detection and classification of light vehicles and personnel on a mine site. We capitalise on the rapid advances of ConvNet based object recognition but highlight that a naive black box approach results in a significant number of false positives. In particular, the lack of domain specific training data and the unique landscape in a mine site causes a high rate of errors. We exploit the abundance of background-only images to train a k-means classifier to complement the ConvNet. Furthermore, localisation of objects of interest and a reduction in computation is enabled through region proposals. Our system is tested on over 10 km of real mine site data and we were able to detect both light vehicles and personnel. We show that the introduction of our background model can reduce the false positive rate by an order of magnitude.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    It is important to note that this is for visualisation purposes only and that the y-axis does not represent the true probability since the final SVM layer of RCNN was not calibrated for probabilistic outputs.

References

  1. Bewley, A., Guizilini, V., Ramos, F., Upcroft, B.: Online self-supervised multi-instance segmentation of dynamic objects. In: International Conference on Robotics and Automation, Hong Kong, China, IEEE (2014)

    Google Scholar 

  2. Bewley, A., Upcroft, B.: Advantages of exploiting projection structure for segmenting dense 3D point clouds. In: Australian Conference on Robotics and Automation (2013)

    Google Scholar 

  3. Deng, H., Clausi, D.A.: Unsupervised image segmentation using a simple MRF model with a new implementation scheme. Pattern Recognit. 37(12), 2323–2335 (2004)

    Article  Google Scholar 

  4. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255, June 2009

    Google Scholar 

  5. Donahue, J., Hoffman, J., Rodner, E., Saenko, K., Darrell, T.: Semi-supervised domain adaptation with instance constraints. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 668–675, June 2013

    Google Scholar 

  6. Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. Pattern Anal. Mach. Intell. 35(8), 1915–1929 (2013)

    Article  Google Scholar 

  7. Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Comput. Vis. Image Underst. 106(1), 59–70 (2007)

    Article  Google Scholar 

  8. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)

    Article  Google Scholar 

  9. Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. Int. J. Comput. Vis. 59(2), 167–181 (2004)

    Article  Google Scholar 

  10. Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Computer Vision and Pattern Recognition (CVPR) (2014)

    Google Scholar 

  11. Hosang, J., Benenson, R., Schiele, B.: How good are detection proposals, really?. In: British Machine Vision Conference (BMVC) (2014)

    Google Scholar 

  12. Jia, Y.: Caffe: an open source convolutional architecture for fast feature embedding (2013). http://caffe.berkeleyvision.org/

  13. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. (NIPS) 1(2), 4 (2012)

    Google Scholar 

  14. LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1, 541–551 (1989)

    Article  Google Scholar 

  15. Lin, M., Chen, Q., Yan, S.: Network In Network (2013). arXiv preprint

    Google Scholar 

  16. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: European Conference on Computer Vision (ECCV), vol. 8693, pp. 740–755. Springer International Publishing (2014)

    Google Scholar 

  17. Marshall, J.A., Barfoot, T.D.: Design and field testing of an autonomous underground tramming system. Springer Tracts Adv. Robot. 42, 521–530 (2008)

    Article  MATH  Google Scholar 

  18. Mosberger, R., Andreasson, H.: Estimating the 3d position of humans wearing a reflective vest using a single camera system. In: International Conference on Field and Service Robotics (FSR) (2012)

    Google Scholar 

  19. Mosberger, R., Andreasson, H., Lilienthal, A.J.: Multi-human tracking using high-visibility clothing for industrial safety. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 638–644 (2013)

    Google Scholar 

  20. Phillips, T., Hahn, M., McAree, R.: An evaluation of ranging sensor performance for mining automation applications. In: IEEE/ASME International Conference on Advanced Intelligent Mechatronics: Mechatronics for Human Wellbeing, AIM 2013, pp. 1284–1289 (2013)

    Google Scholar 

  21. Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2014)

    Google Scholar 

  22. Roberts, J.M., Corke, P.I.: Obstacle detection for a mining vehicle using a 2D laser. In: Proceedings of the Australian Conference on Robotics and Automation, pp. 185–190 (2000)

    Google Scholar 

  23. Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: integrated recognition, localization and detection using convolutional networks. In: International Conference on Learning Representations (ICLR 2014), December 2014

    Google Scholar 

  24. Sermanet, P., Kavukcuoglu, K., Chintala, S., LeCun, Y.: Pedestrian detection with unsupervised multi-stage feature learning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3626–3633. IEEE, June 2013

    Google Scholar 

  25. Sunderhauf, N., Dayoub, F., Shirazi, S., Upcroft B., Milford, M.: On the performance of ConvNet features for place recognition. In: arXiv (2015)

    Google Scholar 

  26. Torralba, A., Fergus, R., Freeman, W.: 80 Millions tiny images: a large dataset for non-parametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30, 1958–1970 (2008)

    Article  Google Scholar 

  27. Uijlings, J.R.R., Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)

    Article  Google Scholar 

  28. Vondrick, C., Patterson, D., Ramanan, D.: Efficiently scaling up crowdsourced video annotation. Int. J. Comput. Vis. 101(1), 184–204 (2012)

    Article  Google Scholar 

  29. Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks?. In: Advances in Neural Information Processing Systems (NIPS) (2014)

    Google Scholar 

  30. Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: European Conference on Computer Vision (ECCV) (2014)

    Google Scholar 

Download references

Acknowledgments

This research was funded by the Australian Coal Association Research Program (ACARP). The authors would also like to acknowledge AngloAmerican for allowing data collection at the Dawson operation. Acknowledgement also goes to the high performance computing group at Queensland University of Technology for both support and use of their services when conducting some of the experiments in this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alex Bewley .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Bewley, A., Upcroft, B. (2016). From ImageNet to Mining: Adapting Visual Object Detection with Minimal Supervision. In: Wettergreen, D., Barfoot, T. (eds) Field and Service Robotics. Springer Tracts in Advanced Robotics, vol 113. Springer, Cham. https://doi.org/10.1007/978-3-319-27702-8_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27702-8_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27700-4

  • Online ISBN: 978-3-319-27702-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics