Multi-domain Document Layout Understanding Using Few-Shot Object Detection

Singh, Pranaydeep; Varadarajan, Srikrishna; Singh, Ankit Narayan; Srivastava, Muktabh Mayank

doi:10.1007/978-3-030-50516-5_8

Pranaydeep Singh¹¹,
Srikrishna Varadarajan¹¹,
Ankit Narayan Singh¹¹ &
…
Muktabh Mayank Srivastava¹¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12132))

Included in the following conference series:

International Conference on Image Analysis and Recognition

1122 Accesses
5 Citations

Abstract

We try to address the problem of document layout understanding using a simple algorithm which generalizes across multiple domains while training on just few examples per domain. We approach this problem via supervised object detection method and propose a methodology to overcome the requirement of large datasets. We use the concept of transfer learning by pre-training our object detector on a simple artificial (source) dataset and fine-tuning it on a tiny domain specific (target) dataset. We show that this methodology works for multiple domains with training samples as less as 10 documents. We demonstrate the effect of each component of the methodology in the end result and show the superiority of this methodology over simple object detectors. We will open-source the code, trained models, source and target datasets upon acceptance.

P. Singh—Contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/kuangliu/torchcv.

References

Agrawal, M., Doermann, D.S.: Voronoi++: a dynamic page segmentation approach based on voronoi and docstrum features. In: 10th International Conference on Document Analysis and Recognition, ICDAR 2009, Barcelona, Spain, 26–29 July 2009, pp. 1011–1015 (2009). https://doi.org/10.1109/ICDAR.2009.270
Chen, H., Wang, Y., Wang, G., Qiao, Y.: LSTD: a low-shot transfer detector for object detection. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, 2–7 February 2018 (2018). https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16778
Harley, A.W., Ufkes, A., Derpanis, K.G.: Evaluation of deep convolutional nets for document image classification and retrieval, vol. abs/1502.07058 (2015). http://arxiv.org/abs/1502.07058
Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection, vol. abs/1612.03144 (2016). http://arxiv.org/abs/1612.03144
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Mao, S., Kanungo, T.: Empirical performance evaluation methodology and its application to page segmentation algorithms. 23, 242–256 (2001). https://doi.org/10.1109/34.910877
Namboodiri, A.M., Jain, A.K.: Document structure and layout analysis. In: Chaudhuri, B.B. (ed.) Digital Document Processing: Major Directions and Recent Advances. ACVPR, pp. 29–48. Springer, London (2007). https://doi.org/10.1007/978-1-84628-726-8_2
Chapter Google Scholar
O’Gorman, L.: The document spectrum for page layout analysis. 15, 1162–1173 (1993). https://doi.org/10.1109/34.244677
Olson, R.S., Bartley, N., Urbanowicz, R.J., Moore, J.H.: Evaluation of a tree-based pipeline optimization tool for automating data science. In: Proceedings of the 2016 on Genetic and Evolutionary Computation Conference, Denver, CO, USA, 20–24 July 2016, pp. 485–492 (2016). https://doi.org/10.1145/2908812.2908918
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks, vol. abs/1506.01497 (2015). http://arxiv.org/abs/1506.01497
Shafait, F., Keysers, D., Breuel, T.M.: Performance comparison of six algorithms for page segmentation. In: Bunke, H., Spitz, A.L. (eds.) DAS 2006. LNCS, vol. 3872, pp. 368–379. Springer, Heidelberg (2006). https://doi.org/10.1007/11669487_33
Chapter Google Scholar
Yang, X., Yumer, E., Asente, P., Kraley, M., Kifer, D., Giles, C.L.: Learning to extract semantic structure from documents using multimodal fully convolutional neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017, pp. 4342–4351 (2017). https://doi.org/10.1109/CVPR.2017.462

Download references

Author information

Authors and Affiliations

ParallelDots, Inc., Lewes, USA
Pranaydeep Singh, Srikrishna Varadarajan, Ankit Narayan Singh & Muktabh Mayank Srivastava

Authors

Pranaydeep Singh
View author publications
You can also search for this author in PubMed Google Scholar
Srikrishna Varadarajan
View author publications
You can also search for this author in PubMed Google Scholar
Ankit Narayan Singh
View author publications
You can also search for this author in PubMed Google Scholar
Muktabh Mayank Srivastava
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Srikrishna Varadarajan .

Editor information

Editors and Affiliations

University of Porto, Porto, Portugal
Aurélio Campilho
University of Waterloo, Waterloo, ON, Canada
Fakhri Karray
University of Waterloo, Waterloo, ON, Canada
Zhou Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Singh, P., Varadarajan, S., Singh, A.N., Srivastava, M.M. (2020). Multi-domain Document Layout Understanding Using Few-Shot Object Detection. In: Campilho, A., Karray, F., Wang, Z. (eds) Image Analysis and Recognition. ICIAR 2020. Lecture Notes in Computer Science(), vol 12132. Springer, Cham. https://doi.org/10.1007/978-3-030-50516-5_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-50516-5_8
Published: 17 June 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-50515-8
Online ISBN: 978-3-030-50516-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics