Skip to main content

Document Image Classification on the Basis of Layout Information

  • Chapter
  • First Online:
Document Image Processing for Scanning and Printing

Part of the book series: Signals and Communication Technology ((SCT))

Abstract

In this chapter, a document image classification framework based on layout information is described. The proposed method does not use the optical character recognition (OCR) technique; hence, it is completely language independent. Nonetheless, text data are exploited by extracting text regions with a novel maximally stable extremal regions (MSER) approach. The Modified MSER formulation provides great robustness against text distortions in comparison to the existing approach. The two types of novel image descriptors are supplemented with Fisher vectors that are based on the Bernoulli mixture model. Classifiers, based on the aforementioned descriptors, are assembled in a meta-classification system that is able to classify the document in complex cases for which individual classifier accuracy is poor. The meta-classification system created has a low processing time comparable to a single classifier. It is also shown that the method outperforms the existing techniques for a wide range of documents from both well-known and machine-generated document datasets in terms of classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Antonacopoulos, A., Clausner, C., Papadopoulos, C., Pletschacher, S.: ICDAR 2013 competition on historical book recognition. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 1459–1463 (2013)

    Google Scholar 

  • Baldi, S., Marinai, S., Soda, G.: Using tree-grammars for training set expansion in page classification. In: Proceedings of the 7th International Conference on Document Analysis and Recognition, pp. 1–5 (2003)

    Google Scholar 

  • Byun, Y., Lee, Y.: Form classification using DP matching. In: Proceedings of the ACM Symposium on Applied Computing, vol. 1, pp. 1–4 (2000)

    Google Scholar 

  • Cesarini, F., Gori, M., Marinai, S., Soda, G.: Structured document segmentation and representation by the modified XY tree. In: Proceedings of the Fifth International Conference on Document Analysis and Recognition, pp. 563–566 (1999)

    Google Scholar 

  • Cesarini, F., Lastri, M., Marinai, S., Soda, G.: Encoding of modified XY trees for document classification. In: Proceedings of Sixth International Conference on Document Analysis and Recognition, pp. 1131–1136 (2001)

    Google Scholar 

  • Chen, N., Blostein, D.: A survey of document image classification: problem statement, classifier architecture and performance evaluation. Int. J. Doc. Anal. Recogn. 10(1), 1–16 (2007)

    Google Scholar 

  • Chen, S., He, Y., Sun, J., Naoi, S.: Structured document classification by matching local salient features. In: Proceedings of 21st International Conference on Pattern Recognition, pp. 653–656 (2012)

    Google Scholar 

  • Dimmick, D., Garris, M., Wilson, C.: Structured forms database. Technical Report Special Database 2. SFRS, National Institute of Standards and Technology (2001)

    Google Scholar 

  • Ford, G., Thoma, G.: Ground truth data for document image analysis. In: Proceedings of Symposium on Document Image Understanding and Technology, pp. 199–205 (2003)

    Google Scholar 

  • Gao, H., Rusiñol, M., Karatzas, D., Lladós, J., Sato, T., Iwamura, M., Kise, K.: Key-region detection for document images—application to administrative document retrieval. In: Proceedings of the 12th International Conference on Document Analysis and Recognition, pp. 230–234 (2013)

    Google Scholar 

  • Gordo, A., Perronnin, F., Ragnet, F.: Unstructured document classification. US Patent Application 2011/0137898 (2011)

    Google Scholar 

  • Gordo, A., Perronnin, F., Valveny, E.: Large-scale document image retrieval and classification with runlength histograms and binary embeddings. Pattern Recogn. 46(7), 1898–1905 (2013)

    Article  Google Scholar 

  • Jayant, K., Ye, P., Doermann, D.: Structural similarity for document image classification and retrieval. Pattern Recogn. Lett. 43, 119–126 (2014)

    Article  Google Scholar 

  • Marinai, S., Marino, E., Cesarini, F., Soda, G.: A general system for the retrieval of document images from digital libraries. In: Proceedings of First International Workshop on Document Image Analysis for Libraries, vol. 18, no. 14, pp. 274–299 (2004)

    Google Scholar 

  • Marinai, S., Gori, M., Soda, G.: Artificial neural networks for document analysis and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 27(1), 23–35 (2005)

    Article  Google Scholar 

  • Marinai, S., Marino, E., Soda, G.: Tree clustering for layout-based document image retrieval. In: Proceedings of 2nd International Conference on Document Image Analysis for Libraries, pp. 243–253 (2006)

    Google Scholar 

  • Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Proceedings of British Machine Vision Conference, pp. 384–396 (2002)

    Google Scholar 

  • Nattee, C., Numao, M.: Geometric method for document understanding and classification using online machine learning. In: Proceedings of Sixth IEEE International Conference on Document Analysis and Recognition, pp. 602–606 (2001)

    Google Scholar 

  • Nistér, D., Stewénius, H.: Linear time maximally stable extremal regions. In: Proceedings of 10th European Conference on Computer Vision, pp. 183–196 (2008)

    Google Scholar 

  • Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2007)

    Google Scholar 

  • Perronnin, F., Larlus, D.: Fisher vectors meet neural networks: A hybrid classification architecture. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3743–3752 (2015)

    Google Scholar 

  • Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher kernel for large-scale image classification. In: Proceedings of European Conference on Computer Vision, pp. 143–156 (2010)

    Google Scholar 

  • Pintsov, D.: Method and system for commercial document image classification. US Patent 8,831,361 (2014)

    Google Scholar 

  • Platt, J.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola, A.J., Bartlett, P., Scholkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers, pp. 61–74. MIT Press (1999)

    Google Scholar 

  • Shimotsuji, S., Asano, M.: Form identification based on cell structure. In: Proceedings of the 13th International Conference on Pattern Recognition, vol. 3, no. 7276, pp. 793–797 (1996)

    Google Scholar 

  • Shin, C., Doermann, D., Rosenfeld, A.: Classification of document pages using structure-based features. Int. J. Doc. Anal. Recogn. 3(4), 232–247 (2001)

    Article  Google Scholar 

  • Song, M., Rosenfeld, A., Kanungo, T.: Document structure analysis algorithms: a literature survey. Proc. SPIE Electron. Imaging 5010, 197–207 (2003)

    Article  Google Scholar 

  • Ting, A., Leung, M.: Business form classification using strings. In: Proceedings of the 13th International Conference on Pattern Recognition, vol. B, pp. 690–694 (1996)

    Google Scholar 

  • Usilin, S., Nikolaev, D., Postnikov, V., Schaefer, G.: Visual appearance-based document image classification, In: IEEE International Conference on Image Processing, pp. 2133–2136 (2010)

    Google Scholar 

  • Yin, X.-C., Yin, X., Huang, K., Hao, H.-W.: Robust text detection in natural scene images. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 970–983 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ilia V. Safonov .

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Safonov, I.V., Kurilin, I.V., Rychagov, M.N., Tolstaya, E.V. (2019). Document Image Classification on the Basis of Layout Information. In: Document Image Processing for Scanning and Printing . Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-030-05342-0_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-05342-0_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-05341-3

  • Online ISBN: 978-3-030-05342-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics