Skip to main content

Development of an Automatic Document to Digital Record Association Feature for a Cloud-Based Accounting Information System

  • Conference paper
  • First Online:
Intelligent Computing

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 283))

  • 2411 Accesses

Abstract

Documents such as contracts, receipts, and sales invoices are proofs of transactions generated by various functions of business organizations. Though some organizations have initiatives to digitize paper-based proof of transactions, their business processes do not remove paper trails entirely. Organizations normally scan business document transactions, manually classify digitized documents, and associate digitized documents to digital records in a database management system. Hence, the digitization process introduced more work rather than efficiency. This study seeks to eliminate the additional work brought about by document digitization process. It specifically looks at the application of image enhancing techniques and open-source Optical Character Recognition (OCR) technology to automatically classify and associate business documents to digital records in a database management system. The study presents how an alternative document digitizer and image enhancing feature is integrated into an accounting information system to facilitate automatic classification and association of digitized documents to specific database records. The application of image cropping and grayscale color processing image enhancing techniques contributed to achieving an average of 90% level of confidence in extracting field labels while 91.5% level of confidence in extracting field values in business documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Smith, R.: An overview of the Tesseract OCR engine. In: The Ninth International Conference on Document Analysis and Recognition, ICDAR 2007, vol. 2, pp. 629–633, IEEE (2007)

    Google Scholar 

  2. Hamza, H., Belaid, Y., Belaid, A.: A case-based reasoning approach for invoice structure extraction. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), pp. 327–331. Parana (2007). https://doi.org/10.1109/ICDAR.2007.4378726.

  3. Bayer, T.A., Mogg-Schneider, H.U.: A generic system for processing invoices. In: Proceedings of the Fourth International Conference on Document Analysis and Recognition, vol. 2, pp. 740–744. ULM, Germany (1997). https://doi.org/10.1109/ICDAR.1997.620607.

  4. Schulz, F., Ebbecke, M., Gillmann, M., Adrian, B., Agne, S., Dengel, A.: Seizing the treasure: transferring knowledge in invoice analysis. In: 2009 10th International Conference on Document Analysis and Recognition, Barcelona, pp. 848–852 (2009). https://doi.org/10.1109/ICDAR.2009.47.

  5. Jin, S., You, Y., Huafen, Y.: A scanned document image processing model for information system. In: 2010 Asia-Pacific Conference on Wearable Computing Systems, pp. 198–201. Shenzhen (2010). https://doi.org/10.1109/APWCS.2010.56.

  6. Schuster, D., et al.: Intellix -- end-user trained information extraction for document archiving. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 101–105. Washington, DC (2013). https://doi.org/10.1109/ICDAR.2013.28.

  7. Bautista, M.M., Comendador, B.E.V.: Adoption of an open source optical character recognition (OCR) for database buildup of the students’ scholastic records. Int. J. Inf. Electr. Eng. 6(3), 206–209 (2016). https://doi.org/10.18178/IJIEE.2016.6.3.625

  8. Ha, H.T.: Recognition of invoices from scanned documents, RASLAN 2017 Recent Advances in Slavonic Natural Language Processing, p. 71 (2017)

    Google Scholar 

  9. Blanchard, J., Belaïd, Y., Belaïd, A.: Automatic generation of a custom corpora for invoice analysis and recognition. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), p. 1. Sydney, Australia (2019). https://doi.org/10.1109/ICDARW.2019.60121.

  10. Xiaohui Zhao, X.W., Wu, Z.: CUTIE: learning to understand documents with convolutional universal text information extractor (2019). https://arxiv.org/abs/1903.12363.

  11. Rajesh, R., Malathi, P.: An effective denoising and enhancement technique for removal of random impulse noise in images. In: 2016 IEEE International Conference on Advances in Electronics, Communication and Computer Technology (ICAECCT), pp. 256–261. Pune (2016). https://doi.org/10.1109/ICAECCT.2016.7942594.

  12. Brisinello, M., Grbić, R., Pul, M., Anđelić, T.: Improving optical character recognition performance for low quality images. In: 2017 International Symposium ELMAR, pp. 167–171. Zadar (2017). https://doi.org/10.23919/ELMAR.2017.8124460

  13. Roy, A., Laskar, R.H.: Fuzzy SVM based fuzzy adaptive filter for denoising impulse noise from color images. Multimedia Tools Appl. 78(2), 1785–1804 (2018). https://doi.org/10.1007/s11042-018-6303-z

    Article  Google Scholar 

  14. Pullan, P., Mehta, K., Arora, M., Niranjan, V.: Noise reduction from grayscale images. In: 2020 7th International Conference on Signal Processing and Integrated Networks (SPIN), pp. 785–790. Noida, India (2020). https://doi.org/10.1109/SPIN48934.2020.9070915

  15. Karthik, B., Krishna Kumar, T., Vijayaragavan, S.P., Sriram, M.: Removal of high density salt and pepper noise in color image through modified cascaded filter. J. Ambient. Intell. Humaniz. Comput. 12(3), 3901–3908 (2020). https://doi.org/10.1007/s12652-020-01737-1

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel S. Jabonete .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jabonete, D.S., De Leon, M.M. (2022). Development of an Automatic Document to Digital Record Association Feature for a Cloud-Based Accounting Information System. In: Arai, K. (eds) Intelligent Computing. Lecture Notes in Networks and Systems, vol 283. Springer, Cham. https://doi.org/10.1007/978-3-030-80119-9_59

Download citation

Publish with us

Policies and ethics