Document modeling for form class identification

Diana, Sébastien; Trupin, Eric; Lecourtier, Yves; Labiche, Jacques

doi:10.1007/3-540-63791-5_13

Sébastien Diana¹,
Eric Trupin¹,
Yves Lecourtier¹ &
…
Jacques Labiche²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1339))

Included in the following conference series:

Brazilian Symposium on Document Image Analysis

112 Accesses
1 Citations

Abstract

This article deals with the description of a document system analysis based on document modeling. This system is applied to forms which are used by the CAF, the French national family allowance Department -Caisse d 'Allocations Familiales. The system is composed by three different modules which deals with the different form processes. The first module - low-level processing - is divided into three stages : acquisition, binarisation and skew correction. These stages allow the transformation of a paper form into an image with correct qualities. The second module - document structuration - processes this image to extract the information contained in the form. The information is arranged to obtain a tree. This tree shows the organisation of the form content into a hierarchical way. In addition to the tree extraction, the document structuration module allows the creation of a form model base. The last module -form class identification - uses the tree and the form model base. It is composed with two pre-classifiers to extract possible lists of forms and a structural classifier. The two pre-classifiers filter the form classes among the 250 classes in order to reduce the treatment of the classifier. This classifier is based on graph matching to compare the tree of the particular form and the possible list of form extracted during the two pre-classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lamouche, I., Bellissant, C.: Séparation recto/verso d'images de manuscrits anciens. Colloque National sur l'Ecrit et le Document Nantes France (1996) 199–206.
Google Scholar
Chatterjee, C., Raychowdhury, V.P.: Models and algorithms for real-time hybrid image enhancement methodology. Pattern Recognition 29-9 (1996) 1531–1542.
Google Scholar
Sauvola, J., Pietikainen, M.: Page segmentation and classification using fast feature extraction and connectivity analysis. International Conference on Document Analysis and Recognition Montreal Canada 2 (1995) 1127–1131.
Google Scholar
Esposito, F., Malerba, D., Semeraro, G.: Automated acquisition of rules for document understanding. International Conference on Document Analysis and Recognition Tsukuba Science City Japan (1993) 650–654.
Google Scholar
Tang, Y.Y., Suen, C.Y.: Document structures: A survey. International Conference on Document Analysis and Recognition Tsukuba Science City Japan (1993) 99–102.
Google Scholar
Brink, A.D., Pendock, N.E.: Minimum cross-entropy threshold selection. Pattern Recognition 29-1 (1996) 179–188.
Google Scholar
Kittler, J., Illingworth, J.: Minimum error thresholding. Pattern Recognition 19-1 (1986) 41–47.
Google Scholar
Kurita, T., Otsu, N., Abdelmalek, N.: Maximum likelihood thresholding based on population mixture models. Pattern Recognition 25-10 (1992) 1231–1240.
Google Scholar
Le, D.S., Thoma, G.R., Wechsler, H.: Automated page orientation and skew angle detection for binary document images. Pattern Recognition 27-10 (1994) 1325–1344.
Google Scholar
Leroux, M.: P.A.B.L.O, Procédure de saisie de bordereaux par lecture optique. Colloque National sur l'Ecrit et le Document Nantes France (1996) 259–266.
Google Scholar
Trupin, E.: A modified contour following algorithm applied to document segmentation. Intelligence Artificial and Pattern Recognition The Hague Netherlands (1992) 525–528.
Google Scholar
Wahl, F., Wong, F., Casey, R.: Block segmentation and text extraction in mixed text/image documents. Computer Graphics and Image Processing 20 (1982) 375–390.
Google Scholar
Trier, O.D., Taxt, T.: Evaluation of binarisation methods for document images. Pattern Analysis and Machine Intelligence 17-3 (1995) 312–315.
Google Scholar
Watanabe, T., Luo, Q., Sugie, N.: Layout recognition of multi-kinds of table form documents. Pattern Analysis and Machine Intelligence 17-4 (1995) 432–445.
Google Scholar
Casey, R., Ferguson, D., Mohiuddin, K., Walach, E.: Intelligent forms processing system. Machine Vision and Applications 5 (1992) 143–155.
Google Scholar
Dengel, A., Dubiel, F.: Clustering and classification of document structure-A machine learning approach. International Conference on Document Analysis and Recognition 2 (1995) 587–591.
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire PSI / La3I, Université de Rouen, 76821, Mont Saint Aignan Cédex, France
Sébastien Diana, Eric Trupin & Yves Lecourtier
Laboratoire ISMRA / LACP, Université de Caen, 14050, Caen Cédex, France
Jacques Labiche

Authors

Sébastien Diana
View author publications
You can also search for this author in PubMed Google Scholar
Eric Trupin
View author publications
You can also search for this author in PubMed Google Scholar
Yves Lecourtier
View author publications
You can also search for this author in PubMed Google Scholar
Jacques Labiche
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Nabeel A. Murshed Flávio Bortolozzi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Diana, S., Trupin, E., Lecourtier, Y., Labiche, J. (1997). Document modeling for form class identification. In: Murshed, N.A., Bortolozzi, F. (eds) Advances in Document Image Analysis. BSDIA 1997. Lecture Notes in Computer Science, vol 1339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63791-5_13

Download citation

DOI: https://doi.org/10.1007/3-540-63791-5_13
Published: 02 August 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63791-2
Online ISBN: 978-3-540-69646-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics