Data analysis on music classification system and creating a sentiment word dictionary for Kokborok language

  • Sanchali DasEmail author
  • Sambit Satpathy
  • Swapan Debbarma
  • Bidyut K. Bhattacharyya
Original Research


This work shows the development of a lexicon for a poorly resourced language, namely Kokborok. Kokborok is a regional language of North East India and offers an entirely new base for research in music information retrieval (MIR) field. We first create a sentimental word dictionary known as lexicons to develop a polarity classification system. It is a text analysis work involving two types of lyrical features that are: ‘text stylistic feature’, and the features were taken out from the newly developed dictionary. We have also shown the comparative analysis with a various subset of music database based on their accuracy rate. After the system development, the experimental/simulations were done, and the results have been computationally analyzed. We performed linear extrapolation of the data taken by both the feature set, thus developing a dictionary. Text stylistic (TS) features have been observed to converge, at 52 and 39 percent respectively for the number of songs tending to infinity. It has been found that at present, it might be better to increase the features from the dictionary since it gives better accuracy for low resource language Kokborok.


Linear extrapolation Polarity classification Sentiment word dictionary Information retrieval Data mining and analysis Kokborok language 



We are thankful to some undergraduate students for helping us to this research by annotating the dataset. We also thank linguistic people for advising on the making of sentimental word dictionary.


  1. Alhazmi S, Black W, McNaught J (2013) Arabic Senti WordNet in relation to SentiWordNet 3.0. Int J Comput Linguist 4:1–11Google Scholar
  2. Apoorva GD, Mamidi R et al (2017) BolLy: Annotation of Sentiment Polarity in Bollywood Lyrics Dataset. In: International conference of the pacific association for computational linguistics, pp 41–50Google Scholar
  3. Baccianella S, Esuli A, Sebastiani F (2010) Senti wordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. LREC 10:2200–2204Google Scholar
  4. Bakliwal A, Arora P, Varma V (2012) Hindi subjective lexicon: A lexical re- source for Hindi polarity classification. In: Proceedings of the Eight Inter- national Conference on Language Resources and Evaluation (LREC), pp 1189–1196.Google Scholar
  5. Banerjee S (2017) A survey of prospects and problems in Hindustani classical raga identification using machine learning techniques. In: Proceedings of the first international conference on intelligent computing and communication, pp 467–475.Google Scholar
  6. Brahmi A, Ech-Cherif A, Benyettou A (2012) Arabic texts analysis for topic modeling evaluation. Inf Retr 15(1):33–53CrossRefGoogle Scholar
  7. Brinker BD, Dinther RV, Skowronek J (2012) Expressed music mood classification compared with valence and arousal ratings. EURASIP J Audio Speech Music Process 1:24CrossRefGoogle Scholar
  8. Çano E, Morisio M, et al. (2017) Music Mood Dataset Creation Based on Last. fm Tags. In: 2017 International Conference on Artificial Intelligence and Applications, Vienna, Austria.Google Scholar
  9. Çano E, Morisio M (2017) Moody lyrics: A sentiment annotated lyrics dataset. In: Proceedings of the 2017 International Conference on Intelligent Systems, Metaheuristics and Swarm Intelligence, pp 118–124.Google Scholar
  10. Chenlo JM, Losada DE (2014) An empirical study of sentence features for subjectivity and polarity classification. Inf Sci 280:275–288CrossRefGoogle Scholar
  11. Cuzzocrea A, Mumolo E, Vercelli G (2019a) An HMM-based framework for supporting accurate classification of music datasets. Springer, ChamCrossRefGoogle Scholar
  12. Cuzzocrea A, Mumolo E, Vercelli G (2019b) An HMM-based framework for supporting accurate classification of music datasets. Springer, ChamCrossRefGoogle Scholar
  13. Das A, Bandyopadhyay S (2010) Senti WordNet for Indian languages. In: Proceedings of the Eighth Workshop on Asian Language Resources, pp 56–63.Google Scholar
  14. Das S, Mohan P, Debbarma S, K Rajak S et al (2019a) Music mood Taxonomy Generation and classification of Christian Kokborok song: An audio-based approach. Int J Adv Intell Paradig.
  15. Das S, Satpathy S, Debbarma S et al (2019b) Challenges and Requirements of Christian Kokborok Music Irrespective with Mood Classification Systems and Generation of Mood Taxonomy and Sentiment Word Dictionary for Kokborok. Int J Computat Intell 2(1):283–287Google Scholar
  16. Dehkharghani R, Saygin Y, Yanikoglu B, Oflazer K (2016) Senti TurkNet: a Turkish polarity lexicon foFJRr sentiment analysis. Lang Resour Eval 50(3):667–685CrossRefGoogle Scholar
  17. Devitt A, Ahmad K (2013) Is there a language of sentiment? An analysis of lexical resources for sentiment analysis. Lang Resour Eval 47(2):475–511CrossRefGoogle Scholar
  18. Downie JS (2008) The music information retrieval evaluation exchange (2005–2007): a window into music information retrieval research. Acoust Sci Technol 29(4):247–255CrossRefGoogle Scholar
  19. Downie J, West K, Ehmann A, Vincent E (2005) The 2005 music information retrieval evaluation exchange (mirex 2005): Preliminary overview. In: and others (ed) 6th Int. Conference on Music Information Retrieval (ISMIR), pp 320– 323Google Scholar
  20. Duncan N, Fox M et al (2005) Computer-aided music distribution: the future of selection, retrieval and transmission 10(4)Google Scholar
  21. Ghouti L (2016) A new kernel-based classification algorithm for multi-label datasets. Arab J Sci Eng 41(3):759–771MathSciNetCrossRefGoogle Scholar
  22. Giménez-Pérez RM, Franco-Salvador M, Rosso P (2018) String kernels for polarity classification: a study across different languages. In:  International conference on applications of natural language to information systems. Springer, Cham, (pp. 489-493)CrossRefGoogle Scholar
  23. Hevner K (1936) Experimental studies of the elements of expression in music. Am J Psychol 48(2):246–268CrossRefGoogle Scholar
  24. Joshi A, Balamurali AR, Bhattacharyya P, et al. (2010) A fall-back strategy for sentiment analysis in Hindi: a case study. In: Proceedings of the 8th ICON.Google Scholar
  25. Jr CNS, Koerich AL, Kaestner CA (2008) The Latin Music Database. In: ISMIR, pp 451–456.Google Scholar
  26. Khorsheed MS, Al-Thubaity AO (2013) Comparative evaluation of text classification techniques using a large diverse Arabic dataset. Lang Resour Eval 47(2):513–538CrossRefGoogle Scholar
  27. Klenner M, Petrakis S, Fahrni A, et al. (2009) Robust compositional polarity classification. In: Proceedings of the International Conference RANLP-2009, pp 180–184.Google Scholar
  28. Laurier C, Sordo M, Serra J, Herrera P, et al. (2009) music mood representations from social tags. In: ISMIR, pp 381–386Google Scholar
  29. Lemström K, Mikkilä N, Mäkinen V (2010) Filtering methods for content- based retrieval on indexed symbolic music databases. Inf Retr 13(1):1–21CrossRefGoogle Scholar
  30. Lin Y, Zhang J, Wang X, Zhou A (2012) An information theoretic approach to sentiment polarity classification. In: Proceedings of the 2nd joint WICOW/AIR Web workshop on web quality, pp 35–40Google Scholar
  31. Liu Q, Gao Z (2015) Mining opinion polarity from multilingual song lyrics. International workshop on worldwide language service infrastructure. Springer, Berlin, pp 161–172Google Scholar
  32. Mathews D, Clark J (2003) Successful students’ conceptions of mean, standard deviation, and the central limit theorem (unpublished)Google Scholar
  33. Malheiro R, Panda R, Gomes P, Paiva RP et al (2018) Emotionally-relevant features for classification and regression of music lyrics. IEEE Trans Affect Comput 9(2):240–254CrossRefGoogle Scholar
  34. Mammen S, Krishnamurthi I, Varma AJ (2016) Sujatha G (2016) iSargam: music notation representation for Indian Carnatic music. EURASIP J Audio Speech Music Process 1:5CrossRefGoogle Scholar
  35. Montejo-Ráez A, Díaz-Galiano MC, Perea-Ortega JM, Ureña-López LA (2013) Spanish knowledge base generation for polarity classification from masses. In: Proceedings of the 22nd International Conference on World Wide Web. ACM, (pp. 571-578).Google Scholar
  36. Napier K, Shamir L (2018) Quantitative Sentiment Analysis of Lyrics in Popular Music. J Pop Music Stud 30(4):161–176CrossRefGoogle Scholar
  37. Patra BG, Das D, Bandyopadhyay S et al (2013a) Automatic music mood classification of Hindi songs. In: Proceedings of the First International Conference on Mining Intelligence and Knowledge Exploration, MIKE 2013, vol 8284, pp 62–69CrossRefGoogle Scholar
  38. Patra BG, Das D, Bandyopadhyay S et al (2013b) Unsupervised approach to Hindi music mood classificationGoogle Scholar
  39. Patra BG, Das D, Bandyopadhyay S et al (2015a) Mood classification of Hindi songs based on lyrics. In: Proceedings of the 12th international conference on natural language processing, pp 261–267Google Scholar
  40. Patra BG, Das D, Bandyopadhyay S et al (2015b) Music emotion recognition system. In: Proceedings of the international symposium frontiers of research speech and music (FRSM-2015), pp 114–119Google Scholar
  41. Patra BG, Maitra P, Das D, Bandyopadhyay S et al (2015c) MediaEval 2015: music emotion recognition based on feed-forward neural network. In: Proceedings of MediaEval 2015 workshopGoogle Scholar
  42. Patra BG, Das D, Bandyopadhyay S (2016a) Multimodal mood classification framework for Hindi songs. Computación y Sistemas 20(3):515–526CrossRefGoogle Scholar
  43. Patra BG, Das D, Bandyopadhyay S et al (2016b) Multimodal mood classification-a case study of differences in Hindi and western songs. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers, pp 1980–1989Google Scholar
  44. Patra BG, Das D, Bandyopadhyay S et al (2017b) Labeling data and developing supervised framework for Hindi music mood analysis. J Intell Inf Syst 48(3):633–651CrossRefGoogle Scholar
  45. Patra BG, Das D, Bandyopadhyay S (2017a) Retrieving similar lyrics for music recommendation system. In: 14th International Conference on Natural Language Processing, pp 48–52.Google Scholar
  46. Pirkola A, Hedlund T, Keskustalo H, Järvelin K (2001) Dictionary-based cross- language information retrieval: problems, methods, and research findings. Inf Retr 4(3–4):209–230CrossRefGoogle Scholar
  47. Rago A, Marcos C, Diaz-Pace JA (2018) Using semantic roles to improve text classification in the requirements domain. Lang Resour Eval 52(3):801–837CrossRefGoogle Scholar
  48. Russell JA (1980) A circumplex model of affect. J Personal Social Chol 39(6):1161CrossRefGoogle Scholar
  49. Schedl M (2012) Nowplaying Madonna: a large-scale evaluation on estimating similarities between music artists and between movies from microblogs. Inf Retr 15(3–4):183–217CrossRefGoogle Scholar
  50. Schuller B, Dorfner J, Rigoll G (2010) Determination of non-prototypical valence and arousal in popular music: features and performances. EURASIP J Audio Speech Music Process 1:735854CrossRefGoogle Scholar
  51. Srinilta C, Sunhem W, Tungjitnob S, Thasanthiah S, Vatathanavaro S, et al. (2017) Lyric-based sentiment polarity classification of Thai songs. In: Proceedings of the International Multi Conference of Engineers and Computer Scientists.Google Scholar
  52. Thayer RE (1990) The biopsychology of mood and arousal. Oxford University Press, OxfordGoogle Scholar
  53. Trohidis K, Tsoumakas G, Kalliris G (2011) Vlahavas I (2011) Multi-label classification of music by emotion. EURASIP J Audio Speech Music Process 1:4CrossRefGoogle Scholar
  54. Tsakalidis A, Papadopoulos S, Voskaki R, Ioannidou K, Boididou C, Cristea AI, Liakata M, Kompatsiaris Y et al (2018) Building and evaluating resources for sentiment analysis in the Greek language. Lang Resour Eval 52(4):1021–1044CrossRefGoogle Scholar
  55. Ujlambkar AM, Attar VZ, et al. (2012) Mood classification of Indian popular music. In: Proceedings of the CUBE International Information Technology Conference, pp 278–283.Google Scholar
  56. Velankar MR, Sahasrabuddhe HV, et al. (2012) A pilot study of Hindustani music sentiments. In: Proceedings of 2nd Workshop on Sentiment Analysis where AI meets Psychology,” IIT, Bombay, Mumbai, India, pp 91–98.Google Scholar
  57. Wiebe J, Wilson T, Cardie C et al (2005) Annotating expressions of opinions and emotions in language. Lang Resour Eval 39(2–3):165–210CrossRefGoogle Scholar
  58. Wolff D, Weyde T (2014) Learning music similarity from relative user ratings. Inf Retr 17(2):109–136CrossRefGoogle Scholar
  59. Zhang T, Oles FJ (2001) Text categorization based on regularized linear classification methods. Inf Retr 4(1):5–31CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Computer Science and EngineeringNIT AgartalaAgartalaIndia
  2. 2.Electrical EngineeringNIT AgartalaAgartalaIndia

Personalised recommendations