Skip to main content

Multi-domain Document Layout Understanding Using Few-Shot Object Detection

  • Conference paper
  • First Online:
Image Analysis and Recognition (ICIAR 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12132))

Included in the following conference series:

Abstract

We try to address the problem of document layout understanding using a simple algorithm which generalizes across multiple domains while training on just few examples per domain. We approach this problem via supervised object detection method and propose a methodology to overcome the requirement of large datasets. We use the concept of transfer learning by pre-training our object detector on a simple artificial (source) dataset and fine-tuning it on a tiny domain specific (target) dataset. We show that this methodology works for multiple domains with training samples as less as 10 documents. We demonstrate the effect of each component of the methodology in the end result and show the superiority of this methodology over simple object detectors. We will open-source the code, trained models, source and target datasets upon acceptance.

P. Singh—Contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/kuangliu/torchcv.

References

  1. Agrawal, M., Doermann, D.S.: Voronoi++: a dynamic page segmentation approach based on voronoi and docstrum features. In: 10th International Conference on Document Analysis and Recognition, ICDAR 2009, Barcelona, Spain, 26–29 July 2009, pp. 1011–1015 (2009). https://doi.org/10.1109/ICDAR.2009.270

  2. Chen, H., Wang, Y., Wang, G., Qiao, Y.: LSTD: a low-shot transfer detector for object detection. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, 2–7 February 2018 (2018). https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16778

  3. Harley, A.W., Ufkes, A., Derpanis, K.G.: Evaluation of deep convolutional nets for document image classification and retrieval, vol. abs/1502.07058 (2015). http://arxiv.org/abs/1502.07058

  4. Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection, vol. abs/1612.03144 (2016). http://arxiv.org/abs/1612.03144

  5. Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  6. Mao, S., Kanungo, T.: Empirical performance evaluation methodology and its application to page segmentation algorithms. 23, 242–256 (2001). https://doi.org/10.1109/34.910877

  7. Namboodiri, A.M., Jain, A.K.: Document structure and layout analysis. In: Chaudhuri, B.B. (ed.) Digital Document Processing: Major Directions and Recent Advances. ACVPR, pp. 29–48. Springer, London (2007). https://doi.org/10.1007/978-1-84628-726-8_2

    Chapter  Google Scholar 

  8. O’Gorman, L.: The document spectrum for page layout analysis. 15, 1162–1173 (1993). https://doi.org/10.1109/34.244677

  9. Olson, R.S., Bartley, N., Urbanowicz, R.J., Moore, J.H.: Evaluation of a tree-based pipeline optimization tool for automating data science. In: Proceedings of the 2016 on Genetic and Evolutionary Computation Conference, Denver, CO, USA, 20–24 July 2016, pp. 485–492 (2016). https://doi.org/10.1145/2908812.2908918

  10. Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks, vol. abs/1506.01497 (2015). http://arxiv.org/abs/1506.01497

  11. Shafait, F., Keysers, D., Breuel, T.M.: Performance comparison of six algorithms for page segmentation. In: Bunke, H., Spitz, A.L. (eds.) DAS 2006. LNCS, vol. 3872, pp. 368–379. Springer, Heidelberg (2006). https://doi.org/10.1007/11669487_33

    Chapter  Google Scholar 

  12. Yang, X., Yumer, E., Asente, P., Kraley, M., Kifer, D., Giles, C.L.: Learning to extract semantic structure from documents using multimodal fully convolutional neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017, pp. 4342–4351 (2017). https://doi.org/10.1109/CVPR.2017.462

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Srikrishna Varadarajan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Singh, P., Varadarajan, S., Singh, A.N., Srivastava, M.M. (2020). Multi-domain Document Layout Understanding Using Few-Shot Object Detection. In: Campilho, A., Karray, F., Wang, Z. (eds) Image Analysis and Recognition. ICIAR 2020. Lecture Notes in Computer Science(), vol 12132. Springer, Cham. https://doi.org/10.1007/978-3-030-50516-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-50516-5_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-50515-8

  • Online ISBN: 978-3-030-50516-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics