Abstract
This paper proposes a new method for labelling the logical structures of document images. The system starts with digitised images of paper documents, performs a physical layout analysis, runs an OCR and finally exploits the OCR’s outputs to find the meaning of each block of text (i.e. assigns labels like “Title”, “Author”, etc.). The method is an extension of our previous work where a classifier, the perceptive neural network, has been developed to be an analogy of the human perception. We introduce in this connectionist model a temporal dimension by the use of a time-delay neural network with local representation. During the recognition stage, the system performs several recognition cycles and corrections, while keeping track and reusing the previous outputs. This dynamic classifier allows then a better handling of noise and segmentation errors. The experiments have been carried out on two datasets: the public MARG containing more than 1,500 front pages of scientific papers with four zones of interest and another one composed of documents from the Siggraph 2003 conference, where 21 logical structures have been identified. The error rate on MARG is less than 2.5% and 7.3% on the Siggraph dataset.
Similar content being viewed by others
References
ABBYY FineReader Engine: http://www.abbyy.com/ocr_sdk/ (2003)
Alam H., Hartono R., Kumar A., Rahman A.F.R., Tarnikova Y., Wilcox C.: Assuming accurate layout information for web documents is available, what now?. Int. Workshop Document Layout Interpret. Appl. 1(3), 27–30 (2003)
Analyzed Layout and Text Object: http://www.loc.gov/standards/alto/ (2010)
Antonacopoulos A., Pletschacher S., Bridson D., Papadopoulos C.: ICDAR2009 page segmentation competition. Int. Conf. Document Anal. Recognit. 1(10), 1370–1374 (2009)
Belaïd A., Rangoni Y.: Structure extraction in printed documents using neural approaches. Mach. Learn. Document Anal. Recognit. Ser. Stud. Computat. Intell. 90, 21–43 (2008)
van Beusekom J., Keysers D., Shafait F., Breuel T.M.: Example-based logical labeling of document title page images. Int. Conf. Document Anal. Recognit. 1(9), 919–923 (2007)
Blum A., Langley P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97(1–2), 245–271 (1997)
Brugger R., Bapst F., Ingold R.: A DTD extension for document structure recognition. Int. Conf. Electron. Publ. 1375(7), 343–354 (1998)
Candela L., Castelli D., Pagano P.: A reference architecture for digital library systems: principles and applications. LNCS Digit. Libr. Res. Dev., Springer, Berlin 4877(1), 22–35 (2007)
Conway A.: Page grammars and page parsing. A syntactic approach to document layout recognition. Int. Conf. Document Anal. Recognit. 1(2), 761–764 (1993)
Côté M., Lecolinet E., Cheriet M., Suen C.: Automatic reading of cursive scripts using a reading model and perceptual concepts. Int. J. Document Anal. Recognit. 1(1), 3–17 (1998)
Coüasnon B.: DMOS, a generic document recognition method: Application to table structure analysis in a general and in a specific way. Int. J. Document Anal. Recognit. 8(2), 111–122 (2006)
Coyle K.: Mass digitization of books. J. Acad. Librariansh. 32(6), 641–645 (2006)
Dengel A.R., Klein B.: Smartfix: a requirements-driven system for document analysis and understanding. Int. Conf. Document Anal. Recognit. 2423(5), 77–88 (2002)
Doucet A., Kazai G.: ICDAR 2009 book structure extraction competition. Int. Conf. Document Anal. Recognit. 1(10), 1408–1412 (2009)
Ford G., Thoma G.: Ground truth data for document image analysis. Symp. Document Image Underst. Technol. 1(5), 199–205 (2003)
Hruschka H.: Interpretation Aids for Multilayer Perceptron Neural Nets. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin (2005)
Hurst M.: Layout and language: an efficient algorithm for detecting text blocks based on spatial and linguistic evidence. SPIE, Document Recognit. Retr. 4307(8), 56–67 (2001)
Hurst N., Li W., Marriott K.: Review of automatic document formatting. Symp. Document Eng. 1(9), 99–108 (2009)
Hush D., Horne G.: Progress in supervised neural networks: what’s new since Lippmann?. IEEE Signal Process. Mag. 10(1), 8–38 (1993)
Ingold R., Armangil D.: A top-down document analysis method for logical structure recognition. Int. Conf. Document Anal. Recognit. 1(1), 41–49 (1991)
Ishitani Y.: Logical structure analysis of document images based on emergent computation. Int. Conf. Document Anal. Recognit. 1(5), 189–192 (1999)
Kanai J., Rice S.V., Nartker T.A., Nagy G.: Automated evaluation of OCR zoning. IEEE Trans. Pattern Anal. Mach. Intell. 1(17), 86–90 (1995)
Kim J., Le D.X., Thoma G.R.: Automated labeling in document images. SPIE, Document Recognit. Retr. VIII 4307(1), 111–122 (2001)
Kreich J., Luhn A., Maderlechner G.: An experimental environment for model based document analysis. Int. Conf. Document Anal. Recognit. 1(1), 50–58 (1991)
Krishnamoorthy M., Nagy G., Seth S., Viswanathan M.: Syntactic segmentation and labeling of digitized pages from technical journals. IEEE Trans. Pattern Anal. Mach. Intell. 7(15), 737–747 (1993)
Küchler A., Goller C.: Inductive learning in symbolic domains using structure-driven recurrent neural networks. German Conference on Artificial Intelligence: Advances in Artificial Intelligence 1137(20), 183–197 (1996)
Le Cun Y., Bottou L., Orr G., Muller K.: Efficient backprop. Neural netw. Tricks Trade 1524, 9–50 (1998)
Lervik J., Brygfjeld S.: Search engine technology applied in digital libraries. ERCIM News 1(66), 18–19 (2006)
Lin C., Niwa Y., Narita S.: Logical structure analysis of book document images using contents information. Int. Conf. Document Anal. Recognit. 2, 1048–1054 (1997)
Lodwich A., Rangoni Y., Breuel T.: Evaluation of robustness and performance of early stopping rules with multi layer perceptrons. Int. Joint Conf. Neural Netw. 1(19), 1877–1884 (2009)
Logar A.M., Corwin E.M., Oldham W.J.B.: A comparison of recurrent neural network learning algorithms. IEEE Trans. Neural Netw. 2, 1129–1134 (1993)
Schenkel M.I., Guyon D.H.: On-line cursive script recognition using time delay neural networks and hidden markov models. Int. Conf. Acoustics Speech Signal Process. 2, 637–640 (1994)
Maddouri S.S., Amiri H., Belad A., Choisy C.: Combination of local and global vision modelling for arabic handwritten words recognition. Int. Workshop Frontiers Handwrit. Recognit. 1(8), 128–135 (2002)
Mao S., Kim J.W., Thoma G.R.: Style-independent document labeling: design and performance evaluation. SPIE, Document Recognit. Retr. XI 5296(1), 14–22 (2003)
Mao S., Rosenfeld A., Kanungo T.: Document structure analysis algorithms: a literature survey. SPIE, Electron. Imaging 50(10), 197–207 (2003)
Mao S., Thoma G.R.: Bayesian learning of 2D document layout models for automated preservation metadata extraction. Int. Conf. Vis. Imaging Image Process. 1(4), 329–334 (2004)
MARG: Medical Records Groundtruth: http://marg.nlm.nih.gov (2003)
Marinai S., Gori M., Soda G.: Artificial neural networks for document analysis and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 27(1), 23–35 (2005)
McClelland J., Rumelhart D.: An interactive activation model of context effects in letter perception. Psychol. Rev. 88(1), 375–407 (1981)
Nagy G.: Twenty years of document image analysis in PAMI. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 38–62 (2000)
Pearlmutter B.A.: Gradient calculations for dynamic recurrent neural networks:a survey. IEEE Trans. Neural Netw. 6(5), 1212–1228 (1995)
Rangoni Y., Belaïd A.: Data categorization for a context return applied to logical document structure recognition. Int. Conf. Document Anal. Recognit. 1(8), 297–301 (2005)
Rangoni Y., Belaïd A.: Document logical structure analysis based on perceptive cycles. Conf. Document Anal. Syst. 1(7), 117–128 (2006)
Sainz Palmero G.I., Cano Izquierdo J.M., Dimitriadis Y.A., Lopez Coronado J.: A new neuro-fuzzy system for logical labeling of documents. Int. Conf. Pattern Recognit. 18(4), 431–435 (1996)
Sainz Palmero G.I., Dimitriadis Y.A.: Structured document labeling and rule extraction using new recurrent fuzzy-neural systems. Int. Conf. Document Anal. Recognit. 1(5), 181–184 (1999)
Schema for representing OCR results exported from FineReader 6.0: http://www.abbyy.com/FineReader_xml/FineReader6-schema-v1.xml (2002)
Siggraph: http://www.siggraph.org/s2003/ (2003)
Sperduti A., Starita A.: Supervised neural networks for the classification of structures. IEEE Trans. Neural Netw. 8(3), 714–735 (1997)
Summers K.: Near-wordless document structure classification. Int. Conf. Document Anal. Recognit. 1(3), 462–465 (1995)
Szilas N., Cadoz C.: Adaptive networks for physical modeling. Neurocomputing 20(1-3), 209–225 (1998)
Tateisi Y., Itoh N.: Using stochastic syntactic analysis for extracting a logical structure from a document image. Int. Conf. Pattern Recognit. 12(2), 391–394 (1994)
Wan, E.: Time series prediction by using a connectionist network with internal delay lines. In: Weigend A.S., Gershenfeld N.A. (eds.) Time Series Prediction. Forecasting the Future and Understanding the Past, SFI Studies in the Science of Complexity, vol. 17, pp. 195–217. Addison-Wesley, CA (1994)
Yanikoglu B.A., Vincent L.: Pink panther: a complete environment for ground-truthing and benchmarking document page segmentation. Pattern Recognit 31(9), 1191–1204 (1998)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rangoni, Y., Belaïd, A. & Vajda, S. Labelling logical structures of document images using a dynamic perceptive neural network. IJDAR 15, 45–55 (2012). https://doi.org/10.1007/s10032-011-0151-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-011-0151-y