Advertisement

A comparison of local features for camera-based document image retrieval and spotting

  • Quoc Bao DangEmail author
  • Mickaël Coustaty
  • Muhammad Muzzamil Luqman
  • Jean-Marc Ogier
Special Issue Paper
  • 6 Downloads

Abstract

This paper aims at comparing robustness of local features for camera-based document image retrieval and spotting system. We present a literature review of the state of the art of local features extraction that includes keypoint detectors and keypoint descriptors. We also present a dataset and evaluation protocol for camera-based document image retrieval and spotting systems. This dataset is composed of three subparts: The first dataset represents the images with textual content only; the second dataset represents images with graphical content mainly; the third dataset contains text plus graphical elements. Along with the datasets, we present the protocol that describes measurements to evaluate the accuracy and processing time of camera-based document image retrieval and spotting systems. The latter is employed for presenting a detailed evaluation of local features from the literature.

Keywords

Camera-based document image analysis, recognition and retrieval Keypoint detection Keypoint extraction Local feature Document image analysis, recognition and understanding Pattern recognition 

Notes

References

  1. 1.
    Liu, Q., Liao, C.: Paperui. In: International Workshop on Camera-Based Document Analysis and Recognition (CBDAR), pp. 83–100. Springer, Berlin (2012)Google Scholar
  2. 2.
    Takeda, K., Kise, K., Iwamura, M.: Real-time document image retrieval on a smartphone. In: 10th IAPR International Workshop on Document Analysis Systems (DAS) 2012, pp. 225–229. IEEE, New York (2012)Google Scholar
  3. 3.
    Hull, J.J., Erol, B., Graham, J., Ke, Q., Kishi, H., Moraleda, J., Van Olst, D.G.: Paper-based augmented reality. In: 17th International Conference on Artificial Reality and Telexistence, pp. 205–209. IEEE, New York (2007)Google Scholar
  4. 4.
    Electronic Content Management: https://www.imagenetconsulting.com
  5. 5.
    Liu, X., Doermann, D.: Mobile retriever-finding document with a snapshot. In: International Workshop on Camera-Based Document Analysis and Recognition (CBDAR), pp. 29–34 (2007)Google Scholar
  6. 6.
    Google Goggles in Action: http://www.google.com/mobile/
  7. 7.
  8. 8.
    Smeaton, A.F., Spitz, A.L.: Using character shape coding for information retrieval. In: Proceedings of the fourth International Conference on Document Analysis and Recognition, vol. 2, pp. 974–978. IEEE, New York (1997)Google Scholar
  9. 9.
    Shijian, L., Tan, C.L.: Script and language identification in noisy and degraded document images. IEEE Trans. Pattern Anal. Mach. Intell. 30(1), 14–24 (2008)CrossRefGoogle Scholar
  10. 10.
    Lu, S., Tan, C.L.: Keyword spotting and retrieval of document images captured by a digital camera. In: 9th International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 2, pp. 994–998. IEEE, New York (2007)Google Scholar
  11. 11.
    Spitz, A.L.: Determination of the script and language content of document images. IEEE Trans. Pattern Anal. Mach. Intell. 19(3), 235–245 (1997)CrossRefGoogle Scholar
  12. 12.
    Lu, S., Li, L., Tan, C.L.: Document image retrieval through word shape coding. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1913–1918 (2008)CrossRefGoogle Scholar
  13. 13.
    Spitz, A.L.: Using character shape codes for word spotting in document images. In: Dori D., Bruckstein, A. (eds.) Shape, Structure and Pattern Recognition, pp. 382–389. World Scientific (1995) Google Scholar
  14. 14.
    Lu, S., Tan, C.L.: Retrieval of machine-printed latin documents through word shape coding. Pattern Recognit. 41, 1799–1809 (2008)CrossRefzbMATHGoogle Scholar
  15. 15.
    Tuytelaars, T., Mikolajczyk, K.: Local invariant feature detectors: a survey. Found. Trends\(^{\textregistered }\) Comput. Graph. Vis. 3, 177–280 (2008)Google Scholar
  16. 16.
    Rusinol, M., Karatzas, D., Lladós, J.: Spotting graphical symbols in camera-acquired documents in real time. In: Proceedings of the 10th IAPR International Workshop on Graphics Recognition (GREC), 2013 (2013)Google Scholar
  17. 17.
    Liu, Q., Kimber, D., Liao, C., Wilcox, L., et al.: High accuracy and language independent document retrieval with a fast invariant transform. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 386–389. IEEE, New York (2009)Google Scholar
  18. 18.
    Li, J., Allinson, N.M.: A comprehensive review of current local features for computer vision. Neurocomputing 71, 1771–1787 (2008)CrossRefGoogle Scholar
  19. 19.
    Liang, J., Doermann, D., Li, H.: Camera-based analysis of text and documents: a survey. Int. J. Doc. Anal. Recognit. (IJDAR) 7, 84–104 (2005)CrossRefGoogle Scholar
  20. 20.
    Harris, C., Stephens, M.: A combined corner and edge detector. In: Alvey Vision Conference, p. 50, Manchester (1988)Google Scholar
  21. 21.
    Rosten, E., Drummond, T.: Machine learning for high-speed corner detection. In: European Conference on Computer Vision (ECCV), 2006, pp. 430–443. Springer, Berlin (2006)Google Scholar
  22. 22.
    Moravec, H.P.: Towards automatic visual obstacle avoidance. In: Proceedings of the 5th International Joint Conference on Artificial Intelligence—Volume 2, IJCAI 1977 (1977)Google Scholar
  23. 23.
    Mikolajczyk, K., Schmid, C.: Scale and affine invariant interest point detectors. Int. J. Comput. Vis. 60, 63–86 (2004)CrossRefGoogle Scholar
  24. 24.
    Smith, S.M., Brady, J.M.: Susan—a new approach to low level image processing. Int. J. Comput. Vis. 23, 45–78 (1997)CrossRefGoogle Scholar
  25. 25.
    Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: Orb: an efficient alternative to sift or surf. In: 2011 IEEE International Conference on Proceedings of the 2011 International Conference on Computer Vision (ICCV), pp. 2564–2571. IEEE, New York (2011)Google Scholar
  26. 26.
    Leutenegger, S., Chli, M., Siegwart, R.Y.: Brisk: Binary robust invariant scalable keypoints. In: Proceedings of the 2011 International Conference on Computer Vision (ICCV), pp. 2548–2555. IEEE, New York (2011)Google Scholar
  27. 27.
    Mair, E., Hager, G.D., Burschka, D., Suppa, M., Hirzinger, G.: Adaptive and generic corner detection based on the accelerated segment test. In: Proceedings of the 11th European Conference on Computer Vision (ECCV), pp. 183–196. Springer, Berlin (2010)Google Scholar
  28. 28.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)CrossRefGoogle Scholar
  29. 29.
    Bay, H., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (surf). In: Computer Vision and Image Understanding, pp. 346–359 (2008)Google Scholar
  30. 30.
    Alcantarilla, P.F., Bartoli, A., Davison, A.J.: Kaze features. In: European Conference on Computer Vision, pp. 214–227. Springer, Berlin (2012)Google Scholar
  31. 31.
    Alcantarilla, P.F., Solutions, T.: Fast explicit diffusion for accelerated features in nonlinear scale spaces. IEEE Trans. Patt. Anal. Mach. Intell. 34(7), 1281–1298 (2011)Google Scholar
  32. 32.
    Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22, 761–767 (2004)CrossRefGoogle Scholar
  33. 33.
    Nakai, T., Kise, K., Iwamura, M.: Camera based document image retrieval with more time and memory efficient LLAH. In: International Workshop on Camera-Based Document Analysis and Recognition (CBDAR), pp. 21–28 (2007)Google Scholar
  34. 34.
    Nakai, T., Kise, K., Iwamura, M.: Use of affine invariants in locally likely arrangement hashing for camera-based document image retrieval. In: International Workshop on Document Analysis Systems (DAS) 2006, pp. 541–552. Springer, Berlin (2006)Google Scholar
  35. 35.
    Kise, K., Chikano, M., Iwata, K., Iwamura, M., Uchida, S., Omachi, S.: Expansion of queries and databases for improving the retrieval accuracy of document portions: an application to a camera-pen system. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems (DAS) 2010, pp. 309–316. ACM, New York (2010)Google Scholar
  36. 36.
    Desolneux, A., Moisan, L., Morel, J.M.: From Gestalt Theory to Image Analysis: A Probabilistic Approach. Springer, Berlin (2007)Google Scholar
  37. 37.
    Panetta, K.A., Wharton, E.J., Agaian, S.S.: Human visual system-based image enhancement and logarithmic contrast measure. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 38, 174–188 (2008)CrossRefGoogle Scholar
  38. 38.
    Beghdadi, A., Larabi, M.C., Bouzerdoum, A., Iftekharuddin, K.M.: A survey of perceptual image processing methods. Sig. Process. Image Commun. 28, 811–831 (2013)CrossRefGoogle Scholar
  39. 39.
    Fan, B., Wang, Z., Wu, F.: Local Image Descriptor: Modern Approaches. Springer, Berlin (2015)CrossRefzbMATHGoogle Scholar
  40. 40.
    Rosin, P.L.: Measuring corner properties. Comput. Vis. Image Underst. 73, 291–307 (1999)CrossRefGoogle Scholar
  41. 41.
    Alahi, A., Ortiz, R., Vandergheynst, P.: Freak: fast retina keypoint. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 510–517. IEEE, New York (2012)Google Scholar
  42. 42.
    Calonder, M., Lepetit, V., Strecha, C., Fua, P.: Brief: Binary robust independent elementary features. In: European Conference on Computer Vision (ECCV), pp. 778–792. Springer, Berlin (2010)Google Scholar
  43. 43.
    Agrawal, M., Konolige, K., Blas, M.R.: Censure: center surround extremas for realtime feature detection and matching. In: European Conference on Computer Vision, pp. 102–115. Springer, Berlin (2008)Google Scholar
  44. 44.
    Trzcinski, T., Christoudias, M., Fua, P., Lepetit, V.: Boosting binary keypoint descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 2874–2881 (2013)Google Scholar
  45. 45.
    Nakai, T., Kise, K., Iwamura, M.: Use of affine invariants in locally likely arrangement hashing for camera-based document image retrieval. In: Proceedings of International Workshop on Document Analysis Systems(DAS), pp. 541–552. Springer, Berlin (2006)Google Scholar
  46. 46.
    Nakai, T., Kise, K., Iwamura, M.: Hashing with local combinations of feature points and its application to camera-based document image retrieval. In: International Workshop on Camera-Based Document Analysis and Recognition (CBDAR) 2005, pp. 87–94 (2005)Google Scholar
  47. 47.
    Iwamura, M., Nakai, T., Kise, K.: Improvement of retrieval speed and required amount of memory for geometric hashing by combining local invariants. In: Proceedings 18th British Machine Vision Conference (BMVC), pp. 1010–1019 (2007)Google Scholar
  48. 48.
    Takeda, K., Kise, K., Iwamura, M.: Real-time document image retrieval for a 10 million pages database with a memory efficient and stability improved LLAH. In: International Conference on Document Analysis and Recognition (ICDAR), 2011, pp. 1054–1058 (2011)Google Scholar
  49. 49.
    Nakai, T., Kise, K., Iwamura, M.: Real-time retrieval for images of documents in various languages using a web camera. In: 10th International Conference on Document Analysis and Recognition (ICDAR) 2009, pp. 146–150. IEEE, New York (2009)Google Scholar
  50. 50.
    Dang, Q., Luqman, M., Coustaty M.N., Tran, C., Ogier, J.: Srif: scale and rotation invariant features for camera-based document image retrieval. In: ICDAR’15. 13th International Conference on Document Analysis and Recognition, 2015, pp. 601–605. IEEE, New York (2015)Google Scholar
  51. 51.
    Dang, Q.B., Coustaty, M., Luqman, M.M., Ogier, J.M., De Tran, C.: New spatial-organization-based scale and rotation invariant features for heterogeneous-content camera-based document image retrieval. Pattern Recogn. Lett. 112, 153–160 (2018)CrossRefGoogle Scholar
  52. 52.
    Zheng, Q.F., Wang, W.Q., Gao, W.: Effective and efficient object-based image retrieval using visual phrases. In: Proceedings of the 14th ACM International Conference on Multimedia, pp. 77–80. ACM, New York (2006)Google Scholar
  53. 53.
    Nowozin, S., Lampert, C.H.: Structured learning and prediction in computer vision. Found. Trends\(^{\textregistered }\) Comput. Graph. Vis. 6, 185–365 (2011)Google Scholar
  54. 54.
    Blaschko, M.B., Lampert, C.H.: Learning to localize objects with structured output regression. In: European Conference on Computer Vision, pp. 2–15. Springer, Berlin (2008)Google Scholar
  55. 55.
    Tu, Z.: Auto-context and its application to high-level vision tasks. In: 2008. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8. IEEE, New York (2008)Google Scholar
  56. 56.
    Kontschieder, P., Bulo, S.R., Bischof, H., Pelillo, M.: Structured class-labels in random forests for semantic image labelling. In: 2011 International Conference on Computer Vision, pp. 2190–2197. IEEE, New York (2011)Google Scholar
  57. 57.
    Yang, Y., Li, Z., Zhang, L., Murphy, C., Ver Hoeve, J., Jiang, H.: Local label descriptor for example based semantic image labeling. In: European Conference on Computer Vision, pp. 361–375. Springer, Berlin (2012)Google Scholar
  58. 58.
    Maestri, M., Odel, J., Hegdé, J.: Semantic descriptor ranking: a quantitative method for evaluating qualitative verbal reports of visual cognition in the laboratory or the clinic. Front. Psychol. 5, 160 (2014)CrossRefGoogle Scholar
  59. 59.
    Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128–3137 (2015)Google Scholar
  60. 60.
    Agam, G., Argamon, S., Frieder, O., Grossman, D., Lewis, D.: The complex document image processing (CDIP) test collection project. Illinois Institute of Technology (2006). http://ir.iit.edu/projects/CDIP.html
  61. 61.
    University of California, San Francisco: The Legacy Tobacco Document Library (LTDL) (2007). http://legacy.library.ucsf.edu/
  62. 62.
    Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 381–395 (1981)MathSciNetCrossRefGoogle Scholar
  63. 63.
    Valenzuela, R.E.G., Schwartz, W.R., Pedrini, H.: Dimensionality reduction through PCA over SIFT and SURF descriptors. In: 2012 IEEE 11th International Conference on Cybernetic Intelligent Systems (CIS), pp. 58–63. IEEE, New York (2012)Google Scholar
  64. 64.
    Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. In: The 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP) p. 2 (2009)Google Scholar
  65. 65.
    Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 950–961. VLDB Endowment, New York (2007)Google Scholar
  66. 66.
    Fitzgibbon, A.W., Fisher, R.B., et al.: A buyer’s guide to conic fitting. DAI Research Paper (1996)Google Scholar
  67. 67.
    Ricaurte, P., Chilán, C., Aguilera-Carrasco, C.A., Vintimilla, B.X., Sappa, A.D.: Feature point descriptors: infrared and visible spectra. Sensors 14, 3690–3701 (2014)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.L3i LaboratoryUniversity of La RochelleLa Rochelle Cedex 1France

Personalised recommendations