Abstract
This article deals with the description of a document system analysis based on document modeling. This system is applied to forms which are used by the CAF, the French national family allowance Department -Caisse d 'Allocations Familiales. The system is composed by three different modules which deals with the different form processes. The first module - low-level processing - is divided into three stages : acquisition, binarisation and skew correction. These stages allow the transformation of a paper form into an image with correct qualities. The second module - document structuration - processes this image to extract the information contained in the form. The information is arranged to obtain a tree. This tree shows the organisation of the form content into a hierarchical way. In addition to the tree extraction, the document structuration module allows the creation of a form model base. The last module -form class identification - uses the tree and the form model base. It is composed with two pre-classifiers to extract possible lists of forms and a structural classifier. The two pre-classifiers filter the form classes among the 250 classes in order to reduce the treatment of the classifier. This classifier is based on graph matching to compare the tree of the particular form and the possible list of form extracted during the two pre-classifiers.
Preview
Unable to display preview. Download preview PDF.
References
Lamouche, I., Bellissant, C.: Séparation recto/verso d'images de manuscrits anciens. Colloque National sur l'Ecrit et le Document Nantes France (1996) 199–206.
Chatterjee, C., Raychowdhury, V.P.: Models and algorithms for real-time hybrid image enhancement methodology. Pattern Recognition 29-9 (1996) 1531–1542.
Sauvola, J., Pietikainen, M.: Page segmentation and classification using fast feature extraction and connectivity analysis. International Conference on Document Analysis and Recognition Montreal Canada 2 (1995) 1127–1131.
Esposito, F., Malerba, D., Semeraro, G.: Automated acquisition of rules for document understanding. International Conference on Document Analysis and Recognition Tsukuba Science City Japan (1993) 650–654.
Tang, Y.Y., Suen, C.Y.: Document structures: A survey. International Conference on Document Analysis and Recognition Tsukuba Science City Japan (1993) 99–102.
Brink, A.D., Pendock, N.E.: Minimum cross-entropy threshold selection. Pattern Recognition 29-1 (1996) 179–188.
Kittler, J., Illingworth, J.: Minimum error thresholding. Pattern Recognition 19-1 (1986) 41–47.
Kurita, T., Otsu, N., Abdelmalek, N.: Maximum likelihood thresholding based on population mixture models. Pattern Recognition 25-10 (1992) 1231–1240.
Le, D.S., Thoma, G.R., Wechsler, H.: Automated page orientation and skew angle detection for binary document images. Pattern Recognition 27-10 (1994) 1325–1344.
Leroux, M.: P.A.B.L.O, Procédure de saisie de bordereaux par lecture optique. Colloque National sur l'Ecrit et le Document Nantes France (1996) 259–266.
Trupin, E.: A modified contour following algorithm applied to document segmentation. Intelligence Artificial and Pattern Recognition The Hague Netherlands (1992) 525–528.
Wahl, F., Wong, F., Casey, R.: Block segmentation and text extraction in mixed text/image documents. Computer Graphics and Image Processing 20 (1982) 375–390.
Trier, O.D., Taxt, T.: Evaluation of binarisation methods for document images. Pattern Analysis and Machine Intelligence 17-3 (1995) 312–315.
Watanabe, T., Luo, Q., Sugie, N.: Layout recognition of multi-kinds of table form documents. Pattern Analysis and Machine Intelligence 17-4 (1995) 432–445.
Casey, R., Ferguson, D., Mohiuddin, K., Walach, E.: Intelligent forms processing system. Machine Vision and Applications 5 (1992) 143–155.
Dengel, A., Dubiel, F.: Clustering and classification of document structure-A machine learning approach. International Conference on Document Analysis and Recognition 2 (1995) 587–591.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Diana, S., Trupin, E., Lecourtier, Y., Labiche, J. (1997). Document modeling for form class identification. In: Murshed, N.A., Bortolozzi, F. (eds) Advances in Document Image Analysis. BSDIA 1997. Lecture Notes in Computer Science, vol 1339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63791-5_13
Download citation
DOI: https://doi.org/10.1007/3-540-63791-5_13
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63791-2
Online ISBN: 978-3-540-69646-9
eBook Packages: Springer Book Archive