Skip to main content
Log in

Survey and empirical comparison of different approaches for text extraction from scholarly figures

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Different approaches have been proposed in the past to address the challenge of extracting text from scholarly figures. However, until recently, no comparative evaluation of the different approaches had been conducted. Thus, we performed an extensive study of the related work and evaluated in total 32 different approaches. In this work, we perform a more detailed comparison of the 7 most relevant approaches described in the literature and extend to 37 systematic linear combinations of methods for extracting text from scholarly figures. Our generic pipeline, consisting of six steps, allows us to freely combine the different possible methods and perform a fair comparison. Overall, we have evaluated 44 different linear pipeline configurations and systematically compared the different methods. We then derived two non-linear configurations and a two-pass approach. We evaluate all pipeline configurations over four datasets of scholarly figures of different origin and characteristics. The quality of the extraction results is assessed using F-measure and Levenshtein distance, and we measure the runtime performance. Our experiments showed that there is a linear configuration that overall shows the best text extraction quality on all datasets. Further experiments showed that the best configuration can be improved by extending it to a two-pass approach. Regarding the runtime, we observed huge differences from very fast approaches to those running for several weeks. Our experiments found the best working configuration for text extraction from our method set. However, they also showed that further improvements regarding region extraction and classification are needed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://www.kd.informatik.uni-kiel.de/en/research/software/text-extraction, last access: September, 2017

  2. http://www.degruyter.com/, last access: September, 2017

  3. http://www.kd.informatik.uni-kiel.de/en/research/software/text-extraction, last access: September, 2017

  4. https://github.com/tesseract-ocr/, last access: September, 2017

  5. http://www.abbyy.com/ocr-sdk/, last access: September, 2017

  6. http://www-e.uni-magdeburg.de/jschulen/ocr/index.html, last access: September, 2017

  7. https://www.nuance.com/print-capture-and-pdf-solutions/optical-character-recognition/omnipage/omnipage-server-for-developers.html, last access: September, 2017

  8. https://github.com/tmbdev/ocropy, last access: September, 2017

  9. https://www.abbyy.com/en-us/ocr-sdk/, last access: September, 2017

  10. https://www.econbiz.de/, last access: September, 2017

  11. http://www.degruyter.com/, last access: September, 2017

  12. http://www.degruyter.com/dg/page/open-access-policy, last access: September, 2017

  13. https://www.comp.nus.edu.sg/tancl/ChartImageDataset.htm, last access: September, 2017

References

  1. Böschen F, Scherp A (2015) Formalization and preliminary evaluation of a pipeline for text extraction from infographics. In: Bergmann R, Görg S, Müller G (eds) Proceedings of the LWA 2015 Workshops: KDML, FGWM, IR, and FGDB. CEUR-WS.org. volume 1458 of CEUR Workshop Proceedings, Trier, pp 20–31

  2. Böschen F, Scherp A (2015) Multi-oriented text extraction from information graphics. In: Vanoirbeek C, Genevés P (eds) Proceedings of the 2015 ACM Symposium on Document Engineering, DocEng 2015. ACM, Lausanne, pp 35–38

  3. Böschen F, Scherp A (2017) A comparison of approaches for automated text extraction from scholarly figures. In: MultiMedia Modeling - 23rd International Conference, MMM 2017, Reykjavik, Proceedings, Part I, volume 10132 of Lecture Notes in Computer Science. Springer, pp 15–27

  4. Carberry S, Elzer S, Demir S (2006) Information graphics: an untapped resource for digital libraries. In: Efthimiadis EN, Dumais ST, Hawking D, Järvelin K (eds) SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, Seattle, pp 581–588

  5. Carberry S, Schwartz SE, McCoy KF, Demir S, Wu P, Greenbacker CF, Chester D, Schwartz E, Oliver D, Moraes PS (2012) Access to multimodal articles for individuals with sight impairments. ACM Trans Interact Intell Syst 2(4):21

    Article  Google Scholar 

  6. Chen Z, Cafarella MJ, Adar E (2015) Diagramflyer: A search engine for data-driven diagrams. In: Gangemi A, Leonardi S, Panconesi A (eds) Proceedings of the 24th International Conference on World Wide Web Companion, WWW 2015, Florence, 2015 - Companion Volume. ACM, pp 183–186

  7. Chester D, Elzer S (2005) Getting computers to see information graphics so users do not have to. In: Hacid M, Murray NV, Ras ZW, Tsumoto S (eds) editors, Foundations of Intelligent Systems, 15th International Symposium, ISMIS 2005, Saratoga Springs, Proceedings, volume 3488 of Lecture Notes in Computer Science. Springer, pp 660–668

  8. Chiang Y, Knoblock CA (2013) A general approach for extracting road vector data from raster maps. Int J Doc Anal Recogn (IJDAR) 16(1):55–81

    Article  Google Scholar 

  9. Chiang Y, Knoblock CA (2015) Recognizing text in raster maps. GeoInformatica 19(1):1–27

    Article  Google Scholar 

  10. Choudhury SR, Giles CL (2015) An architecture for information extraction from figures in digital libraries. In: Gangemi A, Leonardi S, Panconesi A (eds) Proceedings of the 24th International Conference on World Wide Web Companion, WWW 2015, Florence, 2015 - Companion Volume. ACM, pp 667–672

  11. Deseilligny MP, Men HL, Stamon G (1995) Character string recognition on maps, a rotation-invariant recognition method. Pattern Recogn Lett 16(12):1297–1310

    Article  Google Scholar 

  12. Fraz M, Sarfraz MS, Edirisinghe EA (2015) Exploiting colour information for better scene text detection and recognition. Int J Doc Anal Recogn (IJDAR) 18 (2):153–167

    Article  Google Scholar 

  13. Gao G, Zhang H, Chen H (2015) A robust video text extraction and recognition approach using OCR feedback information. In: Ho Y, Sang J, Ro YM, Kim J, Wu F (eds) Advances in Multimedia Information Processing - PCM 2015 - 16th Pacific-Rim Conference on Multimedia, Gwangju, Proceedings, Part I, volume 9314 of Lecture Notes in Computer Science. Springer, pp 507–517

  14. Gllavata J, Freisleben B (2005) Adaptive fuzzy text segmentation in images with complex backgrounds using color and texture. In: Gagalowicz A, Philips W (eds) Computer Analysis of Images and Patterns, 11th International Conference, CAIP 2005, Versailles, Proceedings, volume 3691 of Lecture Notes in Computer Science. Springer, pp 756–765

  15. Huang W, Tan CL, King PR, Simske SJ (2007) A system for understanding imaged infographics and its applications. In: Proceedings of the 2007 ACM Symposium on Document Engineering. ACM, Winnipeg, pp 9–18

  16. Huang W, Tan CL, Leow WK (2005) Associating text and graphics for scientific chart understanding. In: Eighth International Conference on Document Analysis and Recognition (ICDAR 2005), Seoul, IEEE, Computer Society

  17. Illingworth J, Kittler J (1988) A survey of the hough transform. Comput Vis Graph Image Process 44(1):87–116

    Article  Google Scholar 

  18. Jayant C, Renzelmann M, Wen D, Krisnandi S, Ladner RE, Comden D, Pontelli E, Trewin S (2007) Automated tactile graphics translation: in the field. In: Proceedings of the 9th International ACM SIGACCESS Conference on Computers and Accessibility, ASSETS 2007, Tempe. ACM, pp 75–82

  19. Jiuzhou Z (2006) Creation of synthetic chart image database with ground truth. Honors year project report, National University of Singapore. https://www.comp.nus.edu.sg/tancl/ChartImageDatabase/Report_Zhaojiuzhou.pdf

  20. Khurshid K, Siddiqi I, Faure C, Vincent N (2009) Comparison of Niblack inspired binarization methods for ancient documents. In: Berkner K, Likforman-Sulem L (eds) Document Recognition and Retrieval XVI, DRR 2009, 16th Document Recognition and Retrieval Conference, part of the IS&T-SPIE Electronic Imaging Symposium, San Jose. Proceedings, volume 7247 of SPIE Proceedings, pp 1–10. SPIE

  21. Lu X, Kataria S, Brouwer WJ, Wang JZ, Mitra P, Giles CL (2009) Automated analysis of images in documents for intelligent document search. Int J Doc Anal Recogn (IJDAR) 12(2):65–81

    Article  Google Scholar 

  22. Lu S, Chen T, Tian S, Lim J, Tan CL (2015) Scene text extraction based on edges and support vector regression. Int J Doc Anal Recogn (IJDAR) 18(2):125–135

    Article  Google Scholar 

  23. Olszewska JI (2015) Active contour based optical character recognition for automated scene understanding. Neurocomputing 161:65–71

    Article  Google Scholar 

  24. Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66

    Article  Google Scholar 

  25. Samet H, Tamminen M (1988) Efficient component labeling of images of arbitrary dimension represented by linear bintrees. IEEE Trans Pattern Anal Mach Intell 10 (4):579–586

    Article  Google Scholar 

  26. Sas J, Zolnierek A (2013) Three-stage method of text region extraction from diagram raster images. In: Burduk R, Jackowski K, Kurzynski M, Wozniak M, Zolnierek A (eds) Proceedings of the 8th International Conference on Computer Recognition Systems CORES 2013, Milkow, volume 226 of Advances in Intelligent Systems and Computing. Springer, pp 527–538

  27. Savva M, Kong N, Chhajta A, Li F, Agrawala M, Heer J (2011) Revision: automated classification, analysis and redesign of chart images. In: Pierce JS, Agrawala M, Klemmer SR (eds) Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, Santa Barbara. ACM, pp 393–402

  28. Strohmaier CM, Ringlstetter C, Schulz KU, Mihov S (2003) Lexical postcorrection of ocr-results: The web as a dynamic secondary dictionary? In: 7th International Conference on Document Analysis and Recognition (ICDAR 2003), 2-Volume Set, 3-6 August 2003, Edinburgh, IEEE Computer Society

  29. Xu S, Krauthammer M (2010) A new pivoting and iterative text detection algorithm for biomedical images. J Biomed Inform 43:924–931

    Article  Google Scholar 

  30. Yang L, Huang W, Tan CL (2006) Semi-automatic ground truth generation for chart image recognition. In: Bunke H, Spitz AL (eds) Document Analysis Systems VII, 7th International Workshop, DAS 2006, Nelson, Proceedings, volume 3872 of Lecture Notes in Computer Science. Springer, pp 324–335

Download references

Acknowledgments

This research was co-financed by the EU H2020 project MOVING (http://www.moving-project.eu/) under contract no 693092. We thank ABBYY Europe GmbH for providing us with a test license of the ABBYY FineReader for our experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Falk Böschen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Böschen, F., Beck, T. & Scherp, A. Survey and empirical comparison of different approaches for text extraction from scholarly figures. Multimed Tools Appl 77, 29475–29505 (2018). https://doi.org/10.1007/s11042-018-6162-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6162-7

Keywords

Navigation