MAIM-VO: A Robust Visual Odometry with Mixed MLP for Weak Textured Environment

Shen, Zhiwei; Kong, Bin

doi:10.1007/978-981-99-7549-5_6

Zhiwei Shen^7,8 &
Bin Kong^7,9,10

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1910))

Included in the following conference series:

Chinese Conference on Image and Graphics Technologies

275 Accesses

Abstract

Visual localization is a critical technology for visual SLAM systems, which determines the relative position and motion trajectory by tracking feature points. In recent years, deep learning has been widely applied to the field of visual localization. The method based on deep learning is capable of surpassing the limitations of traditional manual feature extraction methods and achieving high-precision visual localization in complex scenes, thus realizing the goal of lifelong SLAM. The MLP model has characteristics such as flexibility and adaptability. The Mixer-WMLP achieves token information exchange between spatial positions by evenly dividing the feature map into non-overlapping windows, which makes the Mixer-WMLP approach a global receptive field. Compared to CNNs and Transformers, Mixer MLPs have higher computational efficiency and robustness. In this paper, we utilize the Mixer MLP structure to design a deep learning-based visual odometry system called MAIM-VO. Even in complex scenes with low texture areas, high-quality matching can be achieved. After obtaining the matching point pairs, the camera pose is solved in an optimized way by minimizing the reprojection error of the feature points. Multiple datasets and experiments in real-world environments have demonstrated that MIAM-VO exhibits higher robustness and relative localization accuracy compared to currently popular visual SLAM systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Comparison of Deep Learning-Based Monocular Visual Odometry Algorithms

Unsupervised Monocular Visual Odometry with Lightweight Depth Architecture

Guided Feature Selection for Deep Visual Odometry

References

Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Bay, H., Ess, A., Tuytelaars, T., et al.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008)
Article Google Scholar
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: 2011 International Conference on Computer Vision, pp. 2564–2571 (2011)
Google Scholar
DeTone, D., Malisiewicz, T., Rabinovich, A.: Superpoint: self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 224–236 (2018)
Google Scholar
Tang, J., Ericson, L., Folkesson, J., et al.: GCNv2: efficient correspondence prediction for real-time SLAM. IEEE Robot. Autom. Lett. 4(4), 3505–3512 (2019)
Google Scholar
Jiang, W., Trulls, E., Hosang, J., et al.: Cotr: correspondence transformer for matching across images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6207–6217 (2021)
Google Scholar
Sun, J., Shen, Z., Wang, Y., et al.: LoFTR: detector-free local feature matching with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8922–8931 (2021)
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Tolstikhin, I.O., et al.: MLP-mixer: an all-MLP architecture for vision. Neural Inf. Process. Syst. 34, 24261–24272 (2021)
Google Scholar
Shen, Z., Kong, B., Dong, X.: MAIM: a mixer MLP architecture for image matching. Vis. Comput. (2023)
Google Scholar
Engel, J., Schöps, T., Cremers, D.: LSD-SLAM: large-scale direct monocular SLAM. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 834–849. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_54
Chapter Google Scholar
Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 40(3), 611–625 (2017)
Article Google Scholar
Davison, A.J., Reid, I.D., Molton, N.D., Stasse, O.: MonoSLAM: real-time single camera SLAM. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1052–1067 (2007). https://doi.org/10.1109/TPAMI.2007.1049
Article Google Scholar
Kameda, Y.: Parallel tracking and mapping for small AR workspaces (PTAM) Augmented Reality. J. Instit. Image Inf. Telev. Eng. 66(1), 45–51 (2012). https://doi.org/10.3169/itej.66.45
Article Google Scholar
Campos, C., Elvira, R., Rodríguez, J.J.G., Montiel, J.M., Tardós, J.D.: ORB-SLAM3: an accurate open-source library for visual, visual–inertial, and multimap SLAM. IEEE Trans. Robot. 37(6), 1874–1890 (2021)
Article Google Scholar
Qin, T., Li, P., Shen, S.: VINS-Mono: a robust and versatile monocular visual-inertial state estimator. IEEE Trans. Robot. 34(4), 1004–1020 (2018)
Article Google Scholar
DeTone, D., Malisiewicz, T., Rabinovich, A.: Toward geometric deep SLAM. arXiv:1707.07410 (2017)
Sarlin, P.-E., DeTone, D., Malisiewicz, T., Rabinovich, A.: Superglue: learning feature matching with graph neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4938–4947 (2020)
Google Scholar
Rocco, I., Cimpoi, M., Arandjelović, R., et al.: Neighbourhood consensus networks. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
Yu, C., Liu, Z., Liu, X.J., et al.: DS-SLAM: a semantic visual SLAM towards dynamic environments. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1168–1174. IEEE (2018)
Google Scholar
Gao, Y., Zhao, L.: TRVO: a robust visual odometry with deep features. In: The 7th International Workshop on Advanced Computational Intelligence and Intelligent Informatics (IWACIII) (2021)
Google Scholar
Li, Z., Snavely, N.: MegaDepth: learning single-view depth prediction from internet photos. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2041–2050 (2018)
Google Scholar
Maddern, W.P., Pascoe, G., Linegar, C., Newman, P.: 1 year, 1000 km: the Oxford RobotCar dataset. Int. J. Robot. Res. 36, 15–23 (2017)
Article Google Scholar
Schubert, D., Goll, T., Demmel, N., et al.: The TUM VI Benchmark for Evaluating Visual-Inertial Odometry. IEEE (2018)
Google Scholar

Download references

Acknowledgments

This work was supported by the Institute of Robotics and Intelligent Manufacturing Innovation, Chinese Academy of Sciences (Grant number: C2021002).

Author information

Authors and Affiliations

Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei, 230031, China
Zhiwei Shen & Bin Kong
University of Science and Technology of China, Hefei, 230026, China
Zhiwei Shen
Anhui Engineering Laboratory for Intelligent Driving Technology and Application, Hefei, 230088, China
Bin Kong
Innovation Research Institute of Robotics and Intelligent Manufacturing (Hefei), Chinese Academy of Sciences, Hefei, 230088, China
Bin Kong

Authors

Zhiwei Shen
View author publications
You can also search for this author in PubMed Google Scholar
Bin Kong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zhiwei Shen or Bin Kong .

Editor information

Editors and Affiliations

Beijing Institute of Technology, Beijing, China
Wang Yongtian
Beijing University of Technology, Beijing, China
Wu Lifang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shen, Z., Kong, B. (2023). MAIM-VO: A Robust Visual Odometry with Mixed MLP for Weak Textured Environment. In: Yongtian, W., Lifang, W. (eds) Image and Graphics Technologies and Applications. IGTA 2023. Communications in Computer and Information Science, vol 1910. Springer, Singapore. https://doi.org/10.1007/978-981-99-7549-5_6

Download citation

DOI: https://doi.org/10.1007/978-981-99-7549-5_6
Published: 25 October 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7548-8
Online ISBN: 978-981-99-7549-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MAIM-VO: A Robust Visual Odometry with Mixed MLP for Weak Textured Environment

Abstract

Access this chapter

Similar content being viewed by others

A Comparison of Deep Learning-Based Monocular Visual Odometry Algorithms

Unsupervised Monocular Visual Odometry with Lightweight Depth Architecture

Guided Feature Selection for Deep Visual Odometry

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

MAIM-VO: A Robust Visual Odometry with Mixed MLP for Weak Textured Environment

Abstract

Access this chapter

Similar content being viewed by others

A Comparison of Deep Learning-Based Monocular Visual Odometry Algorithms

Unsupervised Monocular Visual Odometry with Lightweight Depth Architecture

Guided Feature Selection for Deep Visual Odometry

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation