OCR Using Computer Vision and Machine Learning

Ranjan, Ashish; Behera, Varun Nagesh Jolly; Reza, Motahar

doi:10.1007/978-3-030-50641-4_6

Ashish Ranjan⁶,
Varun Nagesh Jolly Behera⁶ &
Motahar Reza⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 907))

1809 Accesses
6 Citations

Abstract

Being a well-researched area, optical character recognition or OCR has seen many advancements. Many state-of-the-art algorithms have been developed that can be used for the purpose of OCR but extracting text from images containing tables while preserving the structure of the table still remains a challenging task. Here, we present an efficient and highly scalable parallel architecture to segment input images containing tabular data with and without borders into cells and reconstruct the tabular data while preserving the tabular format. The performance improvement thus made can be used to ease the tedious task of digitizing tabular data in bulk. The same architecture can be used for regular OCR applications to improve performance if the data is in huge quantities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Grother, P. J. (1995). NIST special database 19. Handprinted forms and characters database. National Institute of Standards and Technology.
Google Scholar
LeCun, Y., Cortes, C., & Burges, C. J. C. (1998). The MNIST Database of Handwritten Digits (vol. 10, p. 34). http://yann.lecun.com/exdb/mnist.
Mouchere, H., et al. (2011). Crohme2011: competition on recognition of online handwritten mathematical expressions. In: 2011 International Conference on Document Analysis and Recognition. IEEE.
Google Scholar
Acharya, S., Pant, A. K., & Gyawali, P. K. (2015). Deep learning based large scale handwritten Devanagari character recognition. In: 9th International Conference on Software, Knowledge, Information Management and Applications (SKIMA), p. 2015. IEEE.
Google Scholar
Zhou, S., Chen, Q., & Wang, X. (2010). HIT-OR3C: an opening recognition corpus for Chinese characters. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems. ACM.
Google Scholar
Wang, K., & Belongie, S. (2010, September). Word spotting in the wild. In: European Conference on Computer Vision (pp. 591–604). Berlin: Springer.
Google Scholar
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., & Ng, A. Y. (2011). Reading digits in natural images with unsupervised feature learning.
Google Scholar
Nagy, R., Dicker, A., & Meyer-Wegener, K. (2011, September). NEOCR: a configurable dataset for natural image text recognition. In: International Workshop on Camera-Based Document Analysis and Recognition (pp. 150–163). Berlin: Springer.
Google Scholar
Lee, S., Cho, M. S., Jung, K., & Kim, J. H. (2010, August). Scene text extraction with edge constraint and text collinearity. In: 2010 20th International Conference on Pattern Recognition (pp. 3983–3986). IEEE.
Google Scholar
Yao, C., Bai, X., Liu, W., Ma, Y., & Tu, Z. (2012, June). Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (pp. 1083–1090). IEEE.
Google Scholar
Indermühle, E., Liwicki, M., & Bunke, H. (2010, June). IAMonDo-database: an online handwritten document database with non-uniform contents. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems (pp. 97–104). ACM.
Google Scholar
Liwicki, M., & Bunke, H. (2005, August). IAM-OnDB-an on-line English sentence database acquired from handwritten text on a whiteboard. In: Eighth International Conference on Document Analysis and Recognition (ICDAR 2005) (pp. 956–961). IEEE.
Google Scholar
de Campos, T. E., Babu, B. R., Varma, M. (2009, February). Character recognition in natural images. In: Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP), Lisbon, Portugal.
Google Scholar
Bieniecki, W., Grabowski, S., Rozenberg, W. (2007, May). Image preprocessing for improving OCR accuracy. In: 2007 International Conference on Perspective Technologies and Methods in MEMS Design (pp. 75–80). IEEE.
Google Scholar
Shapiro, L., & Stockman, G. (2001). Computer Vision. Upper Saddle River: Prentice Hall.
Google Scholar
Sezgin, M., & Sankur, B. (2004). Survey over image thresholding techniques and quantitative performance evaluation. Journal of Electronic Imaging, 13(1), 146–166.
Article Google Scholar
Otsu, N. (1979). A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics, 9(1), 62–66.
Article Google Scholar
Serra, J., & Soille, P. (Eds.). (2012). Mathematical Morphology and Its Applications to Image Processing (Vol. 2). Heidelberg: Springer.
Google Scholar
Srisha, R., & Khan, A. (2013). Morphological Operations for Image Processing: Understanding and its Applications.
Google Scholar
Bertalmio, M., Sapiro, G., Caselles, V., & Ballester, C. (2000, July). Image inpainting. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques (pp. 417–424). ACM Press/Addison-Wesley Publishing Co.
Google Scholar
Sasaki, K., Iizuka, S., Simo-Serra, E., & Ishikawa, H. (2017). Joint gap detection and inpainting of line drawings. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5725–5733).
Google Scholar
Canny, J. (1986). A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 679–698.
Article Google Scholar
Maire, M. R. (2009). Contour Detection and Image Segmentation. Berkeley: University of California.
Google Scholar
Bradski, G., & Kaehler, A. (2008). Learning OpenCV: Computer Vision with the OpenCV Library. O’Reilly Media, Inc.
Google Scholar
Zhou, X., et al. (2017). EAST: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5551–5560).
Google Scholar
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 779–788).
Google Scholar
Liu, W., et al. (2016, October). SSD: single shot multibox detector. In: European Conference on Computer Vision (pp. 21–37). Cham: Springer.
Google Scholar
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision (pp. 2961–2969).
Google Scholar
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (pp. 91–99).
Google Scholar
Smith, R. (2007, September). An overview of the Tesseract OCR engine. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) (vol. 2, pp. 629–633). IEEE.
Google Scholar
Karaa, W. B. A., Ashour, A. S., Sassi, D. B., Roy, P., Kausar, N., & Dey, N. (2016). Medline text mining: an enhancement genetic algorithm based approach for document clustering. In: Applications of Intelligent Optimization in Biology and Medicine (pp. 267–287). Cham: Springer.
Google Scholar

Download references

Author information

Authors and Affiliations

National Institute of Science and Technology, Berhampur, Berhampur, India
Ashish Ranjan, Varun Nagesh Jolly Behera & Motahar Reza

Authors

Ashish Ranjan
View author publications
You can also search for this author in PubMed Google Scholar
Varun Nagesh Jolly Behera
View author publications
You can also search for this author in PubMed Google Scholar
Motahar Reza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ashish Ranjan .

Editor information

Editors and Affiliations

School of Computer Science and Engineering, National Institute of Science and Technology (Autonomous), Berhampur, Odisha, India
Santosh Kumar Das
School of Computer Science and Engineering, National Institute of Science and Technology (Autonomous), Berhampur, Odisha, India
Shom Prasad Das
Department of Information Technology, Techno India College of Technology, Kolkata, West Bengal, India
Nilanjan Dey
Founder and Head of the Egyptian Scientific Research Group (SRGE), Information Technology Department, Cairo University, Faculty of Computer and Artificial Intelligence, Giza, Egypt
Aboul-Ella Hassanien

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ranjan, A., Behera, V.N.J., Reza, M. (2021). OCR Using Computer Vision and Machine Learning. In: Das, S., Das, S., Dey, N., Hassanien, AE. (eds) Machine Learning Algorithms for Industrial Applications. Studies in Computational Intelligence, vol 907. Springer, Cham. https://doi.org/10.1007/978-3-030-50641-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-50641-4_6
Published: 19 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-50640-7
Online ISBN: 978-3-030-50641-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics