Monocular SLAM System in Dynamic Scenes Based on Semantic Segmentation
The traditional feature-based visual SLAM algorithm is based on the static environment assumption when recovering scene information and camera motion. The dynamic objects in the scene will affect the positioning accuracy. In this paper, we propose to combine the image semantic segmentation based on deep learning method with the traditional visual SLAM framework to reduce the interference of dynamic objects on the positioning results. Firstly, a supervised Convolutional Neural Network (CNN) is used to segment objects in the input image to obtain the semantic image. Secondly, the feature points are extracted from the original image, and the feature points of the dynamic objects (cars and pedestrians) are eliminated according to the semantic image. Finally, the traditional monocular SLAM method is used to track the camera motion based on the eliminated feature points. The experiments on the Apolloscape datasets show that compared with the traditional method, the proposed method improves the positioning accuracy in dynamic scenes by about 17%.
KeywordsMonocular SLAM Dynamic objects Deep learning Semantic segmentation CNN
This research was supported by Jiangsu Surveying and Mapping Geographic Information Scientific Research Project (JSCHKY201808), National Key Research and Development Project (2016YFB0502101) and National Natural Science Foundation of China (41574026, 41774027).
- 2.Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: Proceedings of the Sixth IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR 2007), Nara, Japan. IEEE, November 2007Google Scholar
- 7.Tan, N.W., Liu, N.H., Dong, Z., et al.: Robust monocular SLAM in dynamic environments. In: 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE Computer Society (2013)Google Scholar
- 9.Chen, W., Fang, M., Liu, Y.H., et al.: Monocular semantic SLAM in dynamic street scene based on multiple object tracking. In: IEEE International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE Conference on Robotics, Automation and Mechatronics (RAM), pp. 599–604. IEEE (2017)Google Scholar
- 10.Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J.: ICNet for real-time semantic segmentation on high-resolution images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 418–434. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_25CrossRefGoogle Scholar
- 11.Huang, X., Cheng, X., Geng, Q., et al.: The apolloscape dataset for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 954–960 (2018)Google Scholar