Multimedia Tools and Applications

, Volume 77, Issue 22, pp 29475–29505 | Cite as

Survey and empirical comparison of different approaches for text extraction from scholarly figures

  • Falk BöschenEmail author
  • Tilman Beck
  • Ansgar Scherp


Different approaches have been proposed in the past to address the challenge of extracting text from scholarly figures. However, until recently, no comparative evaluation of the different approaches had been conducted. Thus, we performed an extensive study of the related work and evaluated in total 32 different approaches. In this work, we perform a more detailed comparison of the 7 most relevant approaches described in the literature and extend to 37 systematic linear combinations of methods for extracting text from scholarly figures. Our generic pipeline, consisting of six steps, allows us to freely combine the different possible methods and perform a fair comparison. Overall, we have evaluated 44 different linear pipeline configurations and systematically compared the different methods. We then derived two non-linear configurations and a two-pass approach. We evaluate all pipeline configurations over four datasets of scholarly figures of different origin and characteristics. The quality of the extraction results is assessed using F-measure and Levenshtein distance, and we measure the runtime performance. Our experiments showed that there is a linear configuration that overall shows the best text extraction quality on all datasets. Further experiments showed that the best configuration can be improved by extending it to a two-pass approach. Regarding the runtime, we observed huge differences from very fast approaches to those running for several weeks. Our experiments found the best working configuration for text extraction from our method set. However, they also showed that further improvements regarding region extraction and classification are needed.


Scholarly figures Text extraction Comparison Figure search 



This research was co-financed by the EU H2020 project MOVING ( under contract no 693092. We thank ABBYY Europe GmbH for providing us with a test license of the ABBYY FineReader for our experiments.


  1. 1.
    Böschen F, Scherp A (2015) Formalization and preliminary evaluation of a pipeline for text extraction from infographics. In: Bergmann R, Görg S, Müller G (eds) Proceedings of the LWA 2015 Workshops: KDML, FGWM, IR, and FGDB. volume 1458 of CEUR Workshop Proceedings, Trier, pp 20–31Google Scholar
  2. 2.
    Böschen F, Scherp A (2015) Multi-oriented text extraction from information graphics. In: Vanoirbeek C, Genevés P (eds) Proceedings of the 2015 ACM Symposium on Document Engineering, DocEng 2015. ACM, Lausanne, pp 35–38Google Scholar
  3. 3.
    Böschen F, Scherp A (2017) A comparison of approaches for automated text extraction from scholarly figures. In: MultiMedia Modeling - 23rd International Conference, MMM 2017, Reykjavik, Proceedings, Part I, volume 10132 of Lecture Notes in Computer Science. Springer, pp 15–27Google Scholar
  4. 4.
    Carberry S, Elzer S, Demir S (2006) Information graphics: an untapped resource for digital libraries. In: Efthimiadis EN, Dumais ST, Hawking D, Järvelin K (eds) SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, Seattle, pp 581–588Google Scholar
  5. 5.
    Carberry S, Schwartz SE, McCoy KF, Demir S, Wu P, Greenbacker CF, Chester D, Schwartz E, Oliver D, Moraes PS (2012) Access to multimodal articles for individuals with sight impairments. ACM Trans Interact Intell Syst 2(4):21CrossRefGoogle Scholar
  6. 6.
    Chen Z, Cafarella MJ, Adar E (2015) Diagramflyer: A search engine for data-driven diagrams. In: Gangemi A, Leonardi S, Panconesi A (eds) Proceedings of the 24th International Conference on World Wide Web Companion, WWW 2015, Florence, 2015 - Companion Volume. ACM, pp 183–186Google Scholar
  7. 7.
    Chester D, Elzer S (2005) Getting computers to see information graphics so users do not have to. In: Hacid M, Murray NV, Ras ZW, Tsumoto S (eds) editors, Foundations of Intelligent Systems, 15th International Symposium, ISMIS 2005, Saratoga Springs, Proceedings, volume 3488 of Lecture Notes in Computer Science. Springer, pp 660–668Google Scholar
  8. 8.
    Chiang Y, Knoblock CA (2013) A general approach for extracting road vector data from raster maps. Int J Doc Anal Recogn (IJDAR) 16(1):55–81CrossRefGoogle Scholar
  9. 9.
    Chiang Y, Knoblock CA (2015) Recognizing text in raster maps. GeoInformatica 19(1):1–27CrossRefGoogle Scholar
  10. 10.
    Choudhury SR, Giles CL (2015) An architecture for information extraction from figures in digital libraries. In: Gangemi A, Leonardi S, Panconesi A (eds) Proceedings of the 24th International Conference on World Wide Web Companion, WWW 2015, Florence, 2015 - Companion Volume. ACM, pp 667–672Google Scholar
  11. 11.
    Deseilligny MP, Men HL, Stamon G (1995) Character string recognition on maps, a rotation-invariant recognition method. Pattern Recogn Lett 16(12):1297–1310CrossRefGoogle Scholar
  12. 12.
    Fraz M, Sarfraz MS, Edirisinghe EA (2015) Exploiting colour information for better scene text detection and recognition. Int J Doc Anal Recogn (IJDAR) 18 (2):153–167CrossRefGoogle Scholar
  13. 13.
    Gao G, Zhang H, Chen H (2015) A robust video text extraction and recognition approach using OCR feedback information. In: Ho Y, Sang J, Ro YM, Kim J, Wu F (eds) Advances in Multimedia Information Processing - PCM 2015 - 16th Pacific-Rim Conference on Multimedia, Gwangju, Proceedings, Part I, volume 9314 of Lecture Notes in Computer Science. Springer, pp 507–517Google Scholar
  14. 14.
    Gllavata J, Freisleben B (2005) Adaptive fuzzy text segmentation in images with complex backgrounds using color and texture. In: Gagalowicz A, Philips W (eds) Computer Analysis of Images and Patterns, 11th International Conference, CAIP 2005, Versailles, Proceedings, volume 3691 of Lecture Notes in Computer Science. Springer, pp 756–765Google Scholar
  15. 15.
    Huang W, Tan CL, King PR, Simske SJ (2007) A system for understanding imaged infographics and its applications. In: Proceedings of the 2007 ACM Symposium on Document Engineering. ACM, Winnipeg, pp 9–18Google Scholar
  16. 16.
    Huang W, Tan CL, Leow WK (2005) Associating text and graphics for scientific chart understanding. In: Eighth International Conference on Document Analysis and Recognition (ICDAR 2005), Seoul, IEEE, Computer SocietyGoogle Scholar
  17. 17.
    Illingworth J, Kittler J (1988) A survey of the hough transform. Comput Vis Graph Image Process 44(1):87–116CrossRefGoogle Scholar
  18. 18.
    Jayant C, Renzelmann M, Wen D, Krisnandi S, Ladner RE, Comden D, Pontelli E, Trewin S (2007) Automated tactile graphics translation: in the field. In: Proceedings of the 9th International ACM SIGACCESS Conference on Computers and Accessibility, ASSETS 2007, Tempe. ACM, pp 75–82Google Scholar
  19. 19.
    Jiuzhou Z (2006) Creation of synthetic chart image database with ground truth. Honors year project report, National University of Singapore.
  20. 20.
    Khurshid K, Siddiqi I, Faure C, Vincent N (2009) Comparison of Niblack inspired binarization methods for ancient documents. In: Berkner K, Likforman-Sulem L (eds) Document Recognition and Retrieval XVI, DRR 2009, 16th Document Recognition and Retrieval Conference, part of the IS&T-SPIE Electronic Imaging Symposium, San Jose. Proceedings, volume 7247 of SPIE Proceedings, pp 1–10. SPIEGoogle Scholar
  21. 21.
    Lu X, Kataria S, Brouwer WJ, Wang JZ, Mitra P, Giles CL (2009) Automated analysis of images in documents for intelligent document search. Int J Doc Anal Recogn (IJDAR) 12(2):65–81CrossRefGoogle Scholar
  22. 22.
    Lu S, Chen T, Tian S, Lim J, Tan CL (2015) Scene text extraction based on edges and support vector regression. Int J Doc Anal Recogn (IJDAR) 18(2):125–135CrossRefGoogle Scholar
  23. 23.
    Olszewska JI (2015) Active contour based optical character recognition for automated scene understanding. Neurocomputing 161:65–71CrossRefGoogle Scholar
  24. 24.
    Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66CrossRefGoogle Scholar
  25. 25.
    Samet H, Tamminen M (1988) Efficient component labeling of images of arbitrary dimension represented by linear bintrees. IEEE Trans Pattern Anal Mach Intell 10 (4):579–586CrossRefGoogle Scholar
  26. 26.
    Sas J, Zolnierek A (2013) Three-stage method of text region extraction from diagram raster images. In: Burduk R, Jackowski K, Kurzynski M, Wozniak M, Zolnierek A (eds) Proceedings of the 8th International Conference on Computer Recognition Systems CORES 2013, Milkow, volume 226 of Advances in Intelligent Systems and Computing. Springer, pp 527–538Google Scholar
  27. 27.
    Savva M, Kong N, Chhajta A, Li F, Agrawala M, Heer J (2011) Revision: automated classification, analysis and redesign of chart images. In: Pierce JS, Agrawala M, Klemmer SR (eds) Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, Santa Barbara. ACM, pp 393–402Google Scholar
  28. 28.
    Strohmaier CM, Ringlstetter C, Schulz KU, Mihov S (2003) Lexical postcorrection of ocr-results: The web as a dynamic secondary dictionary? In: 7th International Conference on Document Analysis and Recognition (ICDAR 2003), 2-Volume Set, 3-6 August 2003, Edinburgh, IEEE Computer SocietyGoogle Scholar
  29. 29.
    Xu S, Krauthammer M (2010) A new pivoting and iterative text detection algorithm for biomedical images. J Biomed Inform 43:924–931CrossRefGoogle Scholar
  30. 30.
    Yang L, Huang W, Tan CL (2006) Semi-automatic ground truth generation for chart image recognition. In: Bunke H, Spitz AL (eds) Document Analysis Systems VII, 7th International Workshop, DAS 2006, Nelson, Proceedings, volume 3872 of Lecture Notes in Computer Science. Springer, pp 324–335Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Kiel UniversityKielGermany
  2. 2.ZBW - Leibniz Information Centre for EconomicsKielGermany

Personalised recommendations