Abstract
This paper presents visual detection and classification of light vehicles and personnel on a mine site. We capitalise on the rapid advances of ConvNet based object recognition but highlight that a naive black box approach results in a significant number of false positives. In particular, the lack of domain specific training data and the unique landscape in a mine site causes a high rate of errors. We exploit the abundance of background-only images to train a k-means classifier to complement the ConvNet. Furthermore, localisation of objects of interest and a reduction in computation is enabled through region proposals. Our system is tested on over 10 km of real mine site data and we were able to detect both light vehicles and personnel. We show that the introduction of our background model can reduce the false positive rate by an order of magnitude.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
It is important to note that this is for visualisation purposes only and that the y-axis does not represent the true probability since the final SVM layer of RCNN was not calibrated for probabilistic outputs.
References
Bewley, A., Guizilini, V., Ramos, F., Upcroft, B.: Online self-supervised multi-instance segmentation of dynamic objects. In: International Conference on Robotics and Automation, Hong Kong, China, IEEE (2014)
Bewley, A., Upcroft, B.: Advantages of exploiting projection structure for segmenting dense 3D point clouds. In: Australian Conference on Robotics and Automation (2013)
Deng, H., Clausi, D.A.: Unsupervised image segmentation using a simple MRF model with a new implementation scheme. Pattern Recognit. 37(12), 2323–2335 (2004)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255, June 2009
Donahue, J., Hoffman, J., Rodner, E., Saenko, K., Darrell, T.: Semi-supervised domain adaptation with instance constraints. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 668–675, June 2013
Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. Pattern Anal. Mach. Intell. 35(8), 1915–1929 (2013)
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Comput. Vis. Image Underst. 106(1), 59–70 (2007)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. Int. J. Comput. Vis. 59(2), 167–181 (2004)
Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Computer Vision and Pattern Recognition (CVPR) (2014)
Hosang, J., Benenson, R., Schiele, B.: How good are detection proposals, really?. In: British Machine Vision Conference (BMVC) (2014)
Jia, Y.: Caffe: an open source convolutional architecture for fast feature embedding (2013). http://caffe.berkeleyvision.org/
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. (NIPS) 1(2), 4 (2012)
LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1, 541–551 (1989)
Lin, M., Chen, Q., Yan, S.: Network In Network (2013). arXiv preprint
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: European Conference on Computer Vision (ECCV), vol. 8693, pp. 740–755. Springer International Publishing (2014)
Marshall, J.A., Barfoot, T.D.: Design and field testing of an autonomous underground tramming system. Springer Tracts Adv. Robot. 42, 521–530 (2008)
Mosberger, R., Andreasson, H.: Estimating the 3d position of humans wearing a reflective vest using a single camera system. In: International Conference on Field and Service Robotics (FSR) (2012)
Mosberger, R., Andreasson, H., Lilienthal, A.J.: Multi-human tracking using high-visibility clothing for industrial safety. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 638–644 (2013)
Phillips, T., Hahn, M., McAree, R.: An evaluation of ranging sensor performance for mining automation applications. In: IEEE/ASME International Conference on Advanced Intelligent Mechatronics: Mechatronics for Human Wellbeing, AIM 2013, pp. 1284–1289 (2013)
Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2014)
Roberts, J.M., Corke, P.I.: Obstacle detection for a mining vehicle using a 2D laser. In: Proceedings of the Australian Conference on Robotics and Automation, pp. 185–190 (2000)
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: integrated recognition, localization and detection using convolutional networks. In: International Conference on Learning Representations (ICLR 2014), December 2014
Sermanet, P., Kavukcuoglu, K., Chintala, S., LeCun, Y.: Pedestrian detection with unsupervised multi-stage feature learning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3626–3633. IEEE, June 2013
Sunderhauf, N., Dayoub, F., Shirazi, S., Upcroft B., Milford, M.: On the performance of ConvNet features for place recognition. In: arXiv (2015)
Torralba, A., Fergus, R., Freeman, W.: 80 Millions tiny images: a large dataset for non-parametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30, 1958–1970 (2008)
Uijlings, J.R.R., Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)
Vondrick, C., Patterson, D., Ramanan, D.: Efficiently scaling up crowdsourced video annotation. Int. J. Comput. Vis. 101(1), 184–204 (2012)
Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks?. In: Advances in Neural Information Processing Systems (NIPS) (2014)
Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: European Conference on Computer Vision (ECCV) (2014)
Acknowledgments
This research was funded by the Australian Coal Association Research Program (ACARP). The authors would also like to acknowledge AngloAmerican for allowing data collection at the Dawson operation. Acknowledgement also goes to the high performance computing group at Queensland University of Technology for both support and use of their services when conducting some of the experiments in this paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Bewley, A., Upcroft, B. (2016). From ImageNet to Mining: Adapting Visual Object Detection with Minimal Supervision. In: Wettergreen, D., Barfoot, T. (eds) Field and Service Robotics. Springer Tracts in Advanced Robotics, vol 113. Springer, Cham. https://doi.org/10.1007/978-3-319-27702-8_33
Download citation
DOI: https://doi.org/10.1007/978-3-319-27702-8_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27700-4
Online ISBN: 978-3-319-27702-8
eBook Packages: EngineeringEngineering (R0)