Abstract
Documents such as contracts, receipts, and sales invoices are proofs of transactions generated by various functions of business organizations. Though some organizations have initiatives to digitize paper-based proof of transactions, their business processes do not remove paper trails entirely. Organizations normally scan business document transactions, manually classify digitized documents, and associate digitized documents to digital records in a database management system. Hence, the digitization process introduced more work rather than efficiency. This study seeks to eliminate the additional work brought about by document digitization process. It specifically looks at the application of image enhancing techniques and open-source Optical Character Recognition (OCR) technology to automatically classify and associate business documents to digital records in a database management system. The study presents how an alternative document digitizer and image enhancing feature is integrated into an accounting information system to facilitate automatic classification and association of digitized documents to specific database records. The application of image cropping and grayscale color processing image enhancing techniques contributed to achieving an average of 90% level of confidence in extracting field labels while 91.5% level of confidence in extracting field values in business documents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Smith, R.: An overview of the Tesseract OCR engine. In: The Ninth International Conference on Document Analysis and Recognition, ICDAR 2007, vol. 2, pp. 629–633, IEEE (2007)
Hamza, H., Belaid, Y., Belaid, A.: A case-based reasoning approach for invoice structure extraction. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), pp. 327–331. Parana (2007). https://doi.org/10.1109/ICDAR.2007.4378726.
Bayer, T.A., Mogg-Schneider, H.U.: A generic system for processing invoices. In: Proceedings of the Fourth International Conference on Document Analysis and Recognition, vol. 2, pp. 740–744. ULM, Germany (1997). https://doi.org/10.1109/ICDAR.1997.620607.
Schulz, F., Ebbecke, M., Gillmann, M., Adrian, B., Agne, S., Dengel, A.: Seizing the treasure: transferring knowledge in invoice analysis. In: 2009 10th International Conference on Document Analysis and Recognition, Barcelona, pp. 848–852 (2009). https://doi.org/10.1109/ICDAR.2009.47.
Jin, S., You, Y., Huafen, Y.: A scanned document image processing model for information system. In: 2010 Asia-Pacific Conference on Wearable Computing Systems, pp. 198–201. Shenzhen (2010). https://doi.org/10.1109/APWCS.2010.56.
Schuster, D., et al.: Intellix -- end-user trained information extraction for document archiving. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 101–105. Washington, DC (2013). https://doi.org/10.1109/ICDAR.2013.28.
Bautista, M.M., Comendador, B.E.V.: Adoption of an open source optical character recognition (OCR) for database buildup of the students’ scholastic records. Int. J. Inf. Electr. Eng. 6(3), 206–209 (2016). https://doi.org/10.18178/IJIEE.2016.6.3.625
Ha, H.T.: Recognition of invoices from scanned documents, RASLAN 2017 Recent Advances in Slavonic Natural Language Processing, p. 71 (2017)
Blanchard, J., Belaïd, Y., Belaïd, A.: Automatic generation of a custom corpora for invoice analysis and recognition. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), p. 1. Sydney, Australia (2019). https://doi.org/10.1109/ICDARW.2019.60121.
Xiaohui Zhao, X.W., Wu, Z.: CUTIE: learning to understand documents with convolutional universal text information extractor (2019). https://arxiv.org/abs/1903.12363.
Rajesh, R., Malathi, P.: An effective denoising and enhancement technique for removal of random impulse noise in images. In: 2016 IEEE International Conference on Advances in Electronics, Communication and Computer Technology (ICAECCT), pp. 256–261. Pune (2016). https://doi.org/10.1109/ICAECCT.2016.7942594.
Brisinello, M., Grbić, R., Pul, M., Anđelić, T.: Improving optical character recognition performance for low quality images. In: 2017 International Symposium ELMAR, pp. 167–171. Zadar (2017). https://doi.org/10.23919/ELMAR.2017.8124460
Roy, A., Laskar, R.H.: Fuzzy SVM based fuzzy adaptive filter for denoising impulse noise from color images. Multimedia Tools Appl. 78(2), 1785–1804 (2018). https://doi.org/10.1007/s11042-018-6303-z
Pullan, P., Mehta, K., Arora, M., Niranjan, V.: Noise reduction from grayscale images. In: 2020 7th International Conference on Signal Processing and Integrated Networks (SPIN), pp. 785–790. Noida, India (2020). https://doi.org/10.1109/SPIN48934.2020.9070915
Karthik, B., Krishna Kumar, T., Vijayaragavan, S.P., Sriram, M.: Removal of high density salt and pepper noise in color image through modified cascaded filter. J. Ambient. Intell. Humaniz. Comput. 12(3), 3901–3908 (2020). https://doi.org/10.1007/s12652-020-01737-1
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Jabonete, D.S., De Leon, M.M. (2022). Development of an Automatic Document to Digital Record Association Feature for a Cloud-Based Accounting Information System. In: Arai, K. (eds) Intelligent Computing. Lecture Notes in Networks and Systems, vol 283. Springer, Cham. https://doi.org/10.1007/978-3-030-80119-9_59
Download citation
DOI: https://doi.org/10.1007/978-3-030-80119-9_59
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-80118-2
Online ISBN: 978-3-030-80119-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)