Document Image Retrieval Based on Convolutional Neural Network

Zhou, Jie; Guo, Baolong; Zheng, Yan

doi:10.1007/978-981-13-9714-1_24

Jie Zhou⁷,
Baolong Guo⁷ &
Yan Zheng⁷

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 156))

681 Accesses
1 Citations

Abstract

With the rapid growth of digital documents, big data puts higher demands on document image retrieval. Document image retrieval is the domain between classical information retrieval and content-based retrieval. The traditional document image retrieval method relies on complex OCR-based text recognition and text similarity detection. This paper proposes a new content-based retrieval method for document graphics objects. This method focuses on feature extraction, feature fusion, and indexing. This paper uses the pretrained convolutional neural network model to learn the image representation of the retrieval task, extracts various features of the document image, and then performs the PCA reduction on the extracted high-dimensional features, and then uses the improved Rank fusion method based on Rank_avg to form new features matrix. Transfer learning is used to fine-tune the trained CNN model and apply it to the retrieval algorithm, which can effectively deal with the deficiency of training data. Finally, the similarity of the features is used to sort, and the query index is established based on the inverted indexing technique of visual vocabulary. Experiments with document image datasets containing charts and texts show that this method has better ability to retrieve document images with similar text contents. The fusion of dimension-reduced CNN features can effectively improve MAP of the retrieval system. The MAP of model fusion with good performance can reach 0.85. The reverse indexing technology based on visual vocabulary can effectively reduce the retrieval time by 27%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Wu, S., Oerlemans, A., Bakker, E.M., Lew, M.S.: Deep binary codes for large scale image retrieval. Neurocomputing (2017)
Google Scholar
Kuo, C.-M., Yang, N.-C., Tseng, S.-C., Chenm, M.-T.: A novel texture descriptor for texture image retrieving. J. Netw. Intell. 3(4), 278–290 (2018)
Google Scholar
En, S., Petitjean, C., Nicolas, S., Heutte, L.: A scalable pattern spotting system for historical documents. Pattern Recognit. 54, 149–161 (2016)
Article Google Scholar
Chen, X., Peng, X., Li, J.-B., Peng, Y.: Overview of deep kernel learning based techniques and applications. J. Netw. Intell. 1(3), 83–98 (2016)
Google Scholar
Lee, C.-F., Wang, Y.-J., Chu, S.-C., Roddick, J.F.: An adaptive content-based image retrieval method exploiting an affine invariant region based on a VQ-applied quadtree robust to geometric distortions. J. Netw. Intell. 3(3), 214–234 (2018)
Google Scholar
Hong, R., Zhang, L., Tao, D.: Unified photo-enhancement by discovering aesthetic communities from flickr. IEEE Trans. Image Process. 25(3), 1124–1135 (2016)
Article MathSciNet Google Scholar
Hong, S., Wang, A., Zhang, X., Gui, Z.: Low-dose CT image processing using artifact suppressed total generalized variation. J. Netw. Intell. 3(1), 26–49 (2018)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: unified, real time object detection. Comput. Sci. 779–788 (2015)
Google Scholar
Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Gatys, L.A., Ecker, A.S.: A neural algorithm of artistic style. Comput. Sci. (2015)
Google Scholar
Babenko, A., Lempitsky, V.: Aggregating local deep features for image retrieval. ICCV (2015)
Google Scholar
Mohedano, E., McGuinness, K., O’Connor, N.E., et al.: Bags of local convolutional features for scalable instance search. ICMR (2016)
Google Scholar
Thuy, Q.D.T., Huu, Q.N., Van, C.P., Quoc, T.N.: An efficient semantic—related image retrieval method. Expert Syst. Appl. 72, 30–41 (2017)
Google Scholar
Chatfield, K., Simonyan, K., Vedaldi, A., et al.: Return of the devil in the details: delving deep into convolutional nets. Comput. Sci. (2014)
Google Scholar
Babenko, A., Slesarev, A., Chigorin, A., et al.: Neural Codes for Image Retrieval, vol. 8689, pp. 584–599. (2014)
Google Scholar
Kaggle ensembling guide. https://mlwave.com/kaggle-ensembling-guide/. Last accessed 11 June 2015

Download references

Author information

Authors and Affiliations

School of Aerospace Science and Technology, Xidian University, Xi’an, 710071, Shaanxi, People’s Republic of China
Jie Zhou, Baolong Guo & Yan Zheng

Authors

Jie Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Baolong Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yan Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jie Zhou .

Editor information

Editors and Affiliations

College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao Shi, Shandong, China
Jeng-Shyang Pan
Northeast Electric Power University, Chuanying Qu, Jilin, China
Jianpo Li
Swinburne University of Technology, Hawthorn, Melbourne, Australia
Pei-Wei Tsai
Centre for Artificial Intelligence, University of Technology Sydney, Sydney, NSW, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, J., Guo, B., Zheng, Y. (2020). Document Image Retrieval Based on Convolutional Neural Network. In: Pan, JS., Li, J., Tsai, PW., Jain, L. (eds) Advances in Intelligent Information Hiding and Multimedia Signal Processing. Smart Innovation, Systems and Technologies, vol 156. Springer, Singapore. https://doi.org/10.1007/978-981-13-9714-1_24

Download citation

DOI: https://doi.org/10.1007/978-981-13-9714-1_24
Published: 10 July 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9713-4
Online ISBN: 978-981-13-9714-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics