Abstract
In this chapter, a document image classification framework based on layout information is described. The proposed method does not use the optical character recognition (OCR) technique; hence, it is completely language independent. Nonetheless, text data are exploited by extracting text regions with a novel maximally stable extremal regions (MSER) approach. The Modified MSER formulation provides great robustness against text distortions in comparison to the existing approach. The two types of novel image descriptors are supplemented with Fisher vectors that are based on the Bernoulli mixture model. Classifiers, based on the aforementioned descriptors, are assembled in a meta-classification system that is able to classify the document in complex cases for which individual classifier accuracy is poor. The meta-classification system created has a low processing time comparable to a single classifier. It is also shown that the method outperforms the existing techniques for a wide range of documents from both well-known and machine-generated document datasets in terms of classification accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Antonacopoulos, A., Clausner, C., Papadopoulos, C., Pletschacher, S.: ICDAR 2013 competition on historical book recognition. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 1459–1463 (2013)
Baldi, S., Marinai, S., Soda, G.: Using tree-grammars for training set expansion in page classification. In: Proceedings of the 7th International Conference on Document Analysis and Recognition, pp. 1–5 (2003)
Byun, Y., Lee, Y.: Form classification using DP matching. In: Proceedings of the ACM Symposium on Applied Computing, vol. 1, pp. 1–4 (2000)
Cesarini, F., Gori, M., Marinai, S., Soda, G.: Structured document segmentation and representation by the modified XY tree. In: Proceedings of the Fifth International Conference on Document Analysis and Recognition, pp. 563–566 (1999)
Cesarini, F., Lastri, M., Marinai, S., Soda, G.: Encoding of modified XY trees for document classification. In: Proceedings of Sixth International Conference on Document Analysis and Recognition, pp. 1131–1136 (2001)
Chen, N., Blostein, D.: A survey of document image classification: problem statement, classifier architecture and performance evaluation. Int. J. Doc. Anal. Recogn. 10(1), 1–16 (2007)
Chen, S., He, Y., Sun, J., Naoi, S.: Structured document classification by matching local salient features. In: Proceedings of 21st International Conference on Pattern Recognition, pp. 653–656 (2012)
Dimmick, D., Garris, M., Wilson, C.: Structured forms database. Technical Report Special Database 2. SFRS, National Institute of Standards and Technology (2001)
Ford, G., Thoma, G.: Ground truth data for document image analysis. In: Proceedings of Symposium on Document Image Understanding and Technology, pp. 199–205 (2003)
Gao, H., Rusiñol, M., Karatzas, D., Lladós, J., Sato, T., Iwamura, M., Kise, K.: Key-region detection for document images—application to administrative document retrieval. In: Proceedings of the 12th International Conference on Document Analysis and Recognition, pp. 230–234 (2013)
Gordo, A., Perronnin, F., Ragnet, F.: Unstructured document classification. US Patent Application 2011/0137898 (2011)
Gordo, A., Perronnin, F., Valveny, E.: Large-scale document image retrieval and classification with runlength histograms and binary embeddings. Pattern Recogn. 46(7), 1898–1905 (2013)
Jayant, K., Ye, P., Doermann, D.: Structural similarity for document image classification and retrieval. Pattern Recogn. Lett. 43, 119–126 (2014)
Marinai, S., Marino, E., Cesarini, F., Soda, G.: A general system for the retrieval of document images from digital libraries. In: Proceedings of First International Workshop on Document Image Analysis for Libraries, vol. 18, no. 14, pp. 274–299 (2004)
Marinai, S., Gori, M., Soda, G.: Artificial neural networks for document analysis and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 27(1), 23–35 (2005)
Marinai, S., Marino, E., Soda, G.: Tree clustering for layout-based document image retrieval. In: Proceedings of 2nd International Conference on Document Image Analysis for Libraries, pp. 243–253 (2006)
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Proceedings of British Machine Vision Conference, pp. 384–396 (2002)
Nattee, C., Numao, M.: Geometric method for document understanding and classification using online machine learning. In: Proceedings of Sixth IEEE International Conference on Document Analysis and Recognition, pp. 602–606 (2001)
Nistér, D., Stewénius, H.: Linear time maximally stable extremal regions. In: Proceedings of 10th European Conference on Computer Vision, pp. 183–196 (2008)
Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2007)
Perronnin, F., Larlus, D.: Fisher vectors meet neural networks: A hybrid classification architecture. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3743–3752 (2015)
Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher kernel for large-scale image classification. In: Proceedings of European Conference on Computer Vision, pp. 143–156 (2010)
Pintsov, D.: Method and system for commercial document image classification. US Patent 8,831,361 (2014)
Platt, J.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola, A.J., Bartlett, P., Scholkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers, pp. 61–74. MIT Press (1999)
Shimotsuji, S., Asano, M.: Form identification based on cell structure. In: Proceedings of the 13th International Conference on Pattern Recognition, vol. 3, no. 7276, pp. 793–797 (1996)
Shin, C., Doermann, D., Rosenfeld, A.: Classification of document pages using structure-based features. Int. J. Doc. Anal. Recogn. 3(4), 232–247 (2001)
Song, M., Rosenfeld, A., Kanungo, T.: Document structure analysis algorithms: a literature survey. Proc. SPIE Electron. Imaging 5010, 197–207 (2003)
Ting, A., Leung, M.: Business form classification using strings. In: Proceedings of the 13th International Conference on Pattern Recognition, vol. B, pp. 690–694 (1996)
Usilin, S., Nikolaev, D., Postnikov, V., Schaefer, G.: Visual appearance-based document image classification, In: IEEE International Conference on Image Processing, pp. 2133–2136 (2010)
Yin, X.-C., Yin, X., Huang, K., Hao, H.-W.: Robust text detection in natural scene images. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 970–983 (2014)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Safonov, I.V., Kurilin, I.V., Rychagov, M.N., Tolstaya, E.V. (2019). Document Image Classification on the Basis of Layout Information. In: Document Image Processing for Scanning and Printing . Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-030-05342-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-05342-0_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05341-3
Online ISBN: 978-3-030-05342-0
eBook Packages: EngineeringEngineering (R0)