Advertisement

Ekush: A Multipurpose and Multitype Comprehensive Database for Online Off-Line Bangla Handwritten Characters

  • AKM Shahariar Azad RabbyEmail author
  • Sadeka Haque
  • Md. Sanzidul Islam
  • Sheikh Abujar
  • Syed Akhter Hossain
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1037)

Abstract

Ekush the largest dataset of handwritten Bangla characters for research on handwritten Bangla character recognition. In recent years Machine learning and deep learning application-based researchers have achieved interest and one of the most significant application is handwritten recognition. Because it has the tremendous application such in Bangla OCR. Also, Bangla writing script is one of the most popular in the world. For that reason, we are introducing a multipurpose comprehensive dataset for Bangla Handwritten Characters. The proposed dataset contains Bangla modifiers, vowels, consonants, compound letters and numerical digits that consists of 367,018 isolated handwritten characters written by 3086 unique writers which were collected within Bangladesh. This dataset can be used for other problems i.e.: gender, age, district base handwritten related research, because the samples were collected include verity of the district, age group and the equal number of male and female. It is intended to fabricate acknowledgment technique for hadn written Bangla characters. This dataset is unreservedly accessible for any sort of scholarly research work. The Ekush dataset is trained and validated with EkushNet and indicated attractive acknowledgment precision 97.73% for Ekush dataset, which is up until this point, the best exactness for Bangla character acknowledgment. The Ekush dataset and relevant code can be found at this link: https://github.com/ShahariarRabby/ekush.

Keywords

Bangla handwritten Data science Machine learning Deep learning Computer vision Pattern recognition 

Notes

Acknowledgement

I would like to express my deepest appreciation to all those who had provided us the possibility to complete this research under the Daffodil International University. A special gratitude we give to our university and Daffodil International University NLP and Machine Learning Research LAB for their instructions and support. Furthermore, I would also like to acknowledge that, this research partially supported by Notre Dame College, Mirpur Bangla School, Dhanmondi Govt. Girls’ High School, Shaheed Bir Uttam Lt. Anwar Girls’ College and Adamjee Cantonment Public School who gave permission to collect data from their institution. Any errors are our own and should not tarnish the reputations of these esteemed persons.

References

  1. 1.
    Singh, S., Hewitt, M.: Cursive digit and character recognition in cedar database. In: Proceedings 15th International Conference on Pattern Recognition, ICPR-2000, vol. 2, pp. 569–572 (2000)Google Scholar
  2. 2.
    Marti, U.-V., Bunke, H.: The iam-database: an english sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5(1), 39–46 (2002)CrossRefGoogle Scholar
  3. 3.
    Bhattacharya, U., Chaudhuri, B.B.: Handwritten numeral databases of Indian scripts and multistage recognition of mixed numerals. IEEE Trans. Pattern Anal. Mach. Intell. 31(3), 444–457 (2009)CrossRefGoogle Scholar
  4. 4.
    Biswas, M., et al.: Banglalekha-isolated: a multi-purpose comprehensive dataset of handwritten bangla isolated characters. Data in Brief, 12, 103–107 (2017)CrossRefGoogle Scholar
  5. 5.
    Sarkar, R., Das, N., Basu, S., Kundu, M., Nasipuri, M., Basu, D.K.: Cmaterdb1: a database of unconstrained handwritten Bangla and Bangla-English mixed script document image. Int. J. Doc. Anal. Recogn. (IJDAR) 15(1), 71–83 (2012)CrossRefGoogle Scholar
  6. 6.
    Santosh, K.C., Nattee, C., Lamiroy, B.: Relative positioning of stroke-based clustering: a new approach to online handwritten devanagari character recognition. Int. J. Image Graphics 12, 1250016 (2012)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Santosh, K.C., Wendling, L.: Character recognition based on non-linear multi-projection profiles measure. Front. Comput. Sci. 9(5), 678–690 (2015)CrossRefGoogle Scholar
  8. 8.
    Deans, S.R.: Applications of the Radon Transform. Wiley Interscience Publications, New York (1983)zbMATHGoogle Scholar
  9. 9.
    Santosh, K.C.: Character recognition based on dtw-radon. In: 11th International Conference on Document Analysis and Recognition - ICDAR, pp. 264–268, September (2011)Google Scholar
  10. 10.
    Liberman, M., Kruskall, J.B.: The symmetric time warping algorithm: From continuous to discrete. In: Time Warps, String Edits and Macromolecules: The Theory and Practice of String Comparison, pp. 125–161. Addison-Wesley, Boston (1983)Google Scholar
  11. 11.
    Shahariar Azad Rabby, A.K.M., Haque, S., Shahinoor, S.A., Abujar, S., Hossain, S.A.: A universal way to collect and process handwritten data for any language. Procedia Comput. Sci. 143, 502–509 (2018). 8th International Conference on Advances in Computing & Communications (ICACC-2018)CrossRefGoogle Scholar
  12. 12.
    Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)CrossRefGoogle Scholar
  13. 13.
    Shahariar Azad Rabby, A.K.M., Haque, S., Abujar, S., Hossain, S.A.: Ekushnet: using convolutional neural network for bangla handwritten recognition. Procedia Comput. Sci. 143, 603–610 (2018). 8th International Conference on Advances in Computing & Communications (ICACC-2018)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  • AKM Shahariar Azad Rabby
    • 1
    Email author
  • Sadeka Haque
    • 1
  • Md. Sanzidul Islam
    • 1
  • Sheikh Abujar
    • 1
  • Syed Akhter Hossain
    • 1
  1. 1.Department of Computer Science and EngineeringDaffodil International UniversityDhakaBangladesh

Personalised recommendations