Ekush: A Multipurpose and Multitype Comprehensive Database for Online Off-Line Bangla Handwritten Characters
Ekush the largest dataset of handwritten Bangla characters for research on handwritten Bangla character recognition. In recent years Machine learning and deep learning application-based researchers have achieved interest and one of the most significant application is handwritten recognition. Because it has the tremendous application such in Bangla OCR. Also, Bangla writing script is one of the most popular in the world. For that reason, we are introducing a multipurpose comprehensive dataset for Bangla Handwritten Characters. The proposed dataset contains Bangla modifiers, vowels, consonants, compound letters and numerical digits that consists of 367,018 isolated handwritten characters written by 3086 unique writers which were collected within Bangladesh. This dataset can be used for other problems i.e.: gender, age, district base handwritten related research, because the samples were collected include verity of the district, age group and the equal number of male and female. It is intended to fabricate acknowledgment technique for hadn written Bangla characters. This dataset is unreservedly accessible for any sort of scholarly research work. The Ekush dataset is trained and validated with EkushNet and indicated attractive acknowledgment precision 97.73% for Ekush dataset, which is up until this point, the best exactness for Bangla character acknowledgment. The Ekush dataset and relevant code can be found at this link: https://github.com/ShahariarRabby/ekush.
KeywordsBangla handwritten Data science Machine learning Deep learning Computer vision Pattern recognition
I would like to express my deepest appreciation to all those who had provided us the possibility to complete this research under the Daffodil International University. A special gratitude we give to our university and Daffodil International University NLP and Machine Learning Research LAB for their instructions and support. Furthermore, I would also like to acknowledge that, this research partially supported by Notre Dame College, Mirpur Bangla School, Dhanmondi Govt. Girls’ High School, Shaheed Bir Uttam Lt. Anwar Girls’ College and Adamjee Cantonment Public School who gave permission to collect data from their institution. Any errors are our own and should not tarnish the reputations of these esteemed persons.
- 1.Singh, S., Hewitt, M.: Cursive digit and character recognition in cedar database. In: Proceedings 15th International Conference on Pattern Recognition, ICPR-2000, vol. 2, pp. 569–572 (2000)Google Scholar
- 9.Santosh, K.C.: Character recognition based on dtw-radon. In: 11th International Conference on Document Analysis and Recognition - ICDAR, pp. 264–268, September (2011)Google Scholar
- 10.Liberman, M., Kruskall, J.B.: The symmetric time warping algorithm: From continuous to discrete. In: Time Warps, String Edits and Macromolecules: The Theory and Practice of String Comparison, pp. 125–161. Addison-Wesley, Boston (1983)Google Scholar
- 11.Shahariar Azad Rabby, A.K.M., Haque, S., Shahinoor, S.A., Abujar, S., Hossain, S.A.: A universal way to collect and process handwritten data for any language. Procedia Comput. Sci. 143, 502–509 (2018). 8th International Conference on Advances in Computing & Communications (ICACC-2018)CrossRefGoogle Scholar