Document Image Classification on the Basis of Layout Information

Safonov, Ilia V.; Kurilin, Ilya V.; Rychagov, Michael N.; Tolstaya, Ekaterina V.

doi:10.1007/978-3-030-05342-0_6

Ilia V. Safonov¹⁰,
Ilya V. Kurilin¹⁰,
Michael N. Rychagov¹⁰ &
…
Ekaterina V. Tolstaya¹⁰

Part of the book series: Signals and Communication Technology ((SCT))

531 Accesses
1 Citations

Abstract

In this chapter, a document image classification framework based on layout information is described. The proposed method does not use the optical character recognition (OCR) technique; hence, it is completely language independent. Nonetheless, text data are exploited by extracting text regions with a novel maximally stable extremal regions (MSER) approach. The Modified MSER formulation provides great robustness against text distortions in comparison to the existing approach. The two types of novel image descriptors are supplemented with Fisher vectors that are based on the Bernoulli mixture model. Classifiers, based on the aforementioned descriptors, are assembled in a meta-classification system that is able to classify the document in complex cases for which individual classifier accuracy is poor. The meta-classification system created has a low processing time comparable to a single classifier. It is also shown that the method outperforms the existing techniques for a wide range of documents from both well-known and machine-generated document datasets in terms of classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Antonacopoulos, A., Clausner, C., Papadopoulos, C., Pletschacher, S.: ICDAR 2013 competition on historical book recognition. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 1459–1463 (2013)
Google Scholar
Baldi, S., Marinai, S., Soda, G.: Using tree-grammars for training set expansion in page classification. In: Proceedings of the 7th International Conference on Document Analysis and Recognition, pp. 1–5 (2003)
Google Scholar
Byun, Y., Lee, Y.: Form classification using DP matching. In: Proceedings of the ACM Symposium on Applied Computing, vol. 1, pp. 1–4 (2000)
Google Scholar
Cesarini, F., Gori, M., Marinai, S., Soda, G.: Structured document segmentation and representation by the modified XY tree. In: Proceedings of the Fifth International Conference on Document Analysis and Recognition, pp. 563–566 (1999)
Google Scholar
Cesarini, F., Lastri, M., Marinai, S., Soda, G.: Encoding of modified XY trees for document classification. In: Proceedings of Sixth International Conference on Document Analysis and Recognition, pp. 1131–1136 (2001)
Google Scholar
Chen, N., Blostein, D.: A survey of document image classification: problem statement, classifier architecture and performance evaluation. Int. J. Doc. Anal. Recogn. 10(1), 1–16 (2007)
Google Scholar
Chen, S., He, Y., Sun, J., Naoi, S.: Structured document classification by matching local salient features. In: Proceedings of 21st International Conference on Pattern Recognition, pp. 653–656 (2012)
Google Scholar
Dimmick, D., Garris, M., Wilson, C.: Structured forms database. Technical Report Special Database 2. SFRS, National Institute of Standards and Technology (2001)
Google Scholar
Ford, G., Thoma, G.: Ground truth data for document image analysis. In: Proceedings of Symposium on Document Image Understanding and Technology, pp. 199–205 (2003)
Google Scholar
Gao, H., Rusiñol, M., Karatzas, D., Lladós, J., Sato, T., Iwamura, M., Kise, K.: Key-region detection for document images—application to administrative document retrieval. In: Proceedings of the 12th International Conference on Document Analysis and Recognition, pp. 230–234 (2013)
Google Scholar
Gordo, A., Perronnin, F., Ragnet, F.: Unstructured document classification. US Patent Application 2011/0137898 (2011)
Google Scholar
Gordo, A., Perronnin, F., Valveny, E.: Large-scale document image retrieval and classification with runlength histograms and binary embeddings. Pattern Recogn. 46(7), 1898–1905 (2013)
Article Google Scholar
Jayant, K., Ye, P., Doermann, D.: Structural similarity for document image classification and retrieval. Pattern Recogn. Lett. 43, 119–126 (2014)
Article Google Scholar
Marinai, S., Marino, E., Cesarini, F., Soda, G.: A general system for the retrieval of document images from digital libraries. In: Proceedings of First International Workshop on Document Image Analysis for Libraries, vol. 18, no. 14, pp. 274–299 (2004)
Google Scholar
Marinai, S., Gori, M., Soda, G.: Artificial neural networks for document analysis and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 27(1), 23–35 (2005)
Article Google Scholar
Marinai, S., Marino, E., Soda, G.: Tree clustering for layout-based document image retrieval. In: Proceedings of 2nd International Conference on Document Image Analysis for Libraries, pp. 243–253 (2006)
Google Scholar
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Proceedings of British Machine Vision Conference, pp. 384–396 (2002)
Google Scholar
Nattee, C., Numao, M.: Geometric method for document understanding and classification using online machine learning. In: Proceedings of Sixth IEEE International Conference on Document Analysis and Recognition, pp. 602–606 (2001)
Google Scholar
Nistér, D., Stewénius, H.: Linear time maximally stable extremal regions. In: Proceedings of 10th European Conference on Computer Vision, pp. 183–196 (2008)
Google Scholar
Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2007)
Google Scholar
Perronnin, F., Larlus, D.: Fisher vectors meet neural networks: A hybrid classification architecture. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3743–3752 (2015)
Google Scholar
Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher kernel for large-scale image classification. In: Proceedings of European Conference on Computer Vision, pp. 143–156 (2010)
Google Scholar
Pintsov, D.: Method and system for commercial document image classification. US Patent 8,831,361 (2014)
Google Scholar
Platt, J.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola, A.J., Bartlett, P., Scholkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers, pp. 61–74. MIT Press (1999)
Google Scholar
Shimotsuji, S., Asano, M.: Form identification based on cell structure. In: Proceedings of the 13th International Conference on Pattern Recognition, vol. 3, no. 7276, pp. 793–797 (1996)
Google Scholar
Shin, C., Doermann, D., Rosenfeld, A.: Classification of document pages using structure-based features. Int. J. Doc. Anal. Recogn. 3(4), 232–247 (2001)
Article Google Scholar
Song, M., Rosenfeld, A., Kanungo, T.: Document structure analysis algorithms: a literature survey. Proc. SPIE Electron. Imaging 5010, 197–207 (2003)
Article Google Scholar
Ting, A., Leung, M.: Business form classification using strings. In: Proceedings of the 13th International Conference on Pattern Recognition, vol. B, pp. 690–694 (1996)
Google Scholar
Usilin, S., Nikolaev, D., Postnikov, V., Schaefer, G.: Visual appearance-based document image classification, In: IEEE International Conference on Image Processing, pp. 2133–2136 (2010)
Google Scholar
Yin, X.-C., Yin, X., Huang, K., Hao, H.-W.: Robust text detection in natural scene images. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 970–983 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Moscow, Russia
Ilia V. Safonov, Ilya V. Kurilin, Michael N. Rychagov & Ekaterina V. Tolstaya

Authors

Ilia V. Safonov
View author publications
You can also search for this author in PubMed Google Scholar
Ilya V. Kurilin
View author publications
You can also search for this author in PubMed Google Scholar
Michael N. Rychagov
View author publications
You can also search for this author in PubMed Google Scholar
Ekaterina V. Tolstaya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ilia V. Safonov .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Safonov, I.V., Kurilin, I.V., Rychagov, M.N., Tolstaya, E.V. (2019). Document Image Classification on the Basis of Layout Information. In: Document Image Processing for Scanning and Printing . Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-030-05342-0_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-05342-0_6
Published: 26 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05341-3
Online ISBN: 978-3-030-05342-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics