A comprehensive method to reject detection outliers by combining template descriptor with sparse 3D point clouds
We are using a template descriptor on the image in order to try and find the object. However, we have a sparse 3D point clouds of the world that is not used at all when looking for the object in the images. Considering there are many false alarms during the detection, we are interested in exploring how to combine the detections on the image with the 3D point clouds in order to reject some detection outliers. In this experiment we use semi-direct-monocular visual odometry (SVO) to provide 3D points coordinates and camera poses to project 3D points to 2D image coordinates. By un-projecting points in the tracking on the selection tree (TST) detection box back to 3D space, we can use 3D Gaussian ellipsoid fitting to determine object scales. By ruling out different scales of detected objects, we can reject most of the detection outliers of the object.
Key wordssemi-direct-monocular visual odometry (SVO) tracking on the selection tree (TST)-recognizer 3D point-clouds Gaussian ellipsoid fitting
CLC numberTP 181
Unable to display preview. Download preview PDF.
- LEE T, SOATTO S. Learning and matching multiscale template descriptors for real-time detection, localization and tracking [C]//In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Colorado, USA: Institute of Electrical and Electronic Engineers, 2011: 1457–1464.Google Scholar
- BABENKO B, YANG M H, BELONGIE S. Visual tracking with online multiple instance learning [C]//In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). San Fransico, USA: Institute of Electrical and Electronic Engineers, 2009: 983–990.Google Scholar
- HINTERSTOISSER S, LEPETIE V, ILIC S, et al. Dominant orientation templates for real-time detection of texture-less objects [C]//In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Miami, USA: Institute of Electrical and Electronic Engineers, 2010: 2057–2069.Google Scholar
- LEE T, SOATTO S. TST/BTD: An end-to-end visual recognition system [R]. Los Angeles: UCLA Technical Report, 2010.Google Scholar
- BLöSCH M, WEISS S, SCARAMUZZA D, et al. Vision based MAV navigation in unknown and unstructured environments [C]//Proceeding IEEE International Conference on Robotics and Automation. Alaska, USA: [s.n.], 2010: 21–28.Google Scholar
- FORSTER C, LYNEN S, KNEIP L, et al. Collaborative monocular SLAM with multiple micro aerial vehicles [J]. Proceeding IEEE/RSJ International Conference on Intelligent Robots and Systems, 2013, 8215(2): 3962–3970.Google Scholar
- FORSTER C, PIZZOLI M, SCARAMUZZA D. Air-ground localization and map augmentation using monocular dense reconstruction [C]//Proceeding IEEE/RSJ International Conference on Intelligent Robots and Systems. Tokyo, Japan: IEEE, 2013: 592–625.Google Scholar
- FORSTER C, PIZZOLI M, SCARAMUZZA D. SVO: fast semi-direct monocular visual odometry [C]//IEEE International Conference on Robotics and Automation (ICRA). Hong Kong, China: IEEE, 2014: 624–675.Google Scholar