A Group-Based Feature Selection Approach to Improve Classification of Holy Quran Verses

  • Abdullahi O. Adeleke
  • Noor Azah Samsudin
  • Aida Mustapha
  • Nazri Mohd Nawi
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 700)


Most existing feature selection approach is limited to determine features from a single source of data. In this paper, a feature selection approach is proposed to consider multiple sources of textual data. The proposed GBFS approach is then applied to label Quranic verses based on two major references, the English translation and tafsir (Commentary). The verses were selected from two chapters, Surah Al-Baqarah and Surah Al-Anaam. The verses are classified into three categories: Faith, Worship, and Etiquette. The textual data from the translation and commentary were preprocessed using StringToWord Vector with weighted TF-IDF. Feature selection algorithms: information gain, chi square, Pearson correlation coefficient, relief, and correlation-based were experimented on four classifiers: naïve Bayes, libSVM, k-NN, and decision trees (J48). The proposed group-based feature selection approach has shown promising results in terms of Accuracy and Area under Receiver Operating Characteristics (ROC) curve (AUC) by achieving Accuracy of 94.5% and AUC of 0.944.


Holy Quran Text classification Feature selection techniques K nearest neighbor Support vector machine Naïve Bayes Decision trees 



This study was supported in part by a grant from the Ministry of Education of Malaysia, Research Acculturation Grant Scheme (RAGS) Vot R045, a grant from Universiti Tun Hussein Onn Malaysia Vot U611, and in part by a grant from Research Gates IT Solution Sdn. Bhd.


  1. 1.
    Ivanovic, M., Radovanovic, M.: Modern machine learning techniques and their applications. In: International Conference on Electronics, Communications and Networks (2015)Google Scholar
  2. 2.
    Das, S., Dey, A., Pal, A., Roy, N.: Applications of Artificial Intelligence in Machine Learning: Review and Prospect. J. of Comput. Appl. 115, 31–41 (2015)Google Scholar
  3. 3.
    Talwar, A., Kumar, Y.: Machine Learning: An Artificial Intelligence Methodology. J. Eng. Comput. Sci. 2, 3400–3404 (2013)Google Scholar
  4. 4.
    Pundir, P., Gomanse, V., Krishnamacharya, N.: Classification and prediction techniques using machine learning for anomaly detection. J. Eng. Res. Appl. 1, 1716–1722 (2013)Google Scholar
  5. 5.
    Tang, J., Alelyani, S., Lin, H.: Feature selection for classification: a review. In: Data Classification: Algorithms and Applications. CRC Press (2014)Google Scholar
  6. 6.
    Faraz, A.: An elaboration of text categorization and automatic text classification through mathematical and graphical modelling. Comput. Sci. Eng. Int. J. 5, 1–11 (2015)Google Scholar
  7. 7.
    Hilal, A., Srinivas, N.: Analytical of the initial holy Quran letters based on data mining study. Am. Int. J. Res. Formal Appl. Nat. Sci. 10, 1–8 (2015)Google Scholar
  8. 8.
    Alhawarat, M.: Extracting Topics from the Holy Quran using generative models. J. Advanc. Comput. Sci. Appl. 6, 288–294 (2015)Google Scholar
  9. 9.
    Prusa, J.D., Khoshgoftaar, T.M., Dittman, D.J.: Impact of feature selection techniques for tweet sentiment classification. In: Proceedings of the Twenty-Eight International Florida Artificial Intelligence Research Society Conference. pp. 299–304 (2015)Google Scholar
  10. 10.
    Hamed, S.K., Ab Aziz, M.J.: A question answering system on holy Quran translation based on question expansion technique and neural network classification. J. Comput. Sci. 12, 169–177 (2016)CrossRefGoogle Scholar
  11. 11.
    Hamoud, B., Atwell, E.: Quran question and answer corpus for data mining with WEKA, pp. 211–216. IEEE Conference of Basic Sciences and Engineering Studies, Leeds (2016)Google Scholar
  12. 12.
    Akour, M., Alsmadi, I., Alazzam, I.: MQVC: measuring Quranic verses similarity and Surah classification using N-Gram. WSEAS Trans. Comput. 13, 485–491 (2014)Google Scholar
  13. 13.
    Siddiqui, M.K., Naahid, S., Khan, M.N.I.: A review of Quranic web portals through data mining. VAWKUM Trans. Comput. Sci. 5, 1–7 (2014)Google Scholar
  14. 14.
    Jamil, N.S., Ku-mahamud, K.R., Din, A.M., Ahmad, F., Chepa, N., Ishak, W.H.W., Din, R., Ahmad, F.K.: A subject identification method based on term frequency technique. J. Advanc. Comput. Res. 7, 103–110 (2017)CrossRefGoogle Scholar
  15. 15.
    Goudjil, M., Bedda, M., Koudil, M., Ghoggali, N.: Using active learning in text classification of Quranic sciences. In: International Conference on Advances in Information Technology for the Holy Quran and Its Sciences, pp. 209–213 (2015)Google Scholar
  16. 16.
    Hassan, G.S., Mohammad, S.K., Alwan, F.M.: Categorization of ‘Holy Quran Tafseer’ using k-Nearest neighbour algorithm. Int. J. Comput. Appl. 129, 1–6 (2015)Google Scholar
  17. 17.
    Leskovec, J., Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets, 2nd edn. Cambridge University Press, England (2014)CrossRefGoogle Scholar
  18. 18.
    Menaka, S., Radha, N.: Text classification using keyword extraction technique. J. Advanc. Res. Comput. Sci. Software Eng. 3, 734–740 (2013)Google Scholar
  19. 19.
    Chen, J., Chen, C., Liang, Y.: Optimized TF-IDF algorithm with the adaptive weight of position of word. Advanc. Intelligen. Syst. Res. 133, 114–117 (2016)Google Scholar
  20. 20.
    Eid, H.F., Hassanien, A.E., Kim, T.H., Banerjee, S.: Linear correlation-based feature selection for network intrusion detection model. Advanc. Security Informat. Commun. Netw. 381, 240–248 (2013)Google Scholar
  21. 21.
    Tang, B., He, H., Baggenstoss, P.M., Kay, S.: A Bayesian classification approach using class-specific features for text categorization. IEEE Trans. Knowl. Data Eng. 28, 1602–1606 (2016)CrossRefGoogle Scholar
  22. 22.
    Zharmagambetov, A.S., Pak, A.A.: Sentiment analysis of document using deep learning and decision trees. In: Twelve IEEE International Conference on Electronics Computer and Computation, pp. 1–4 (2015)Google Scholar
  23. 23.
    Wang, J.H., Wang, H.Y.: Incremental Neural Network Construction for Text Classification. In: IEEE International Symposium on Computer Consumer and Control, pp. 970–973 (2014)Google Scholar
  24. 24.
    Sabbah, T., Selamat, A.: Support vector machine based approach for Quranic words detection in online textual content. In: 8th IEEE Malaysian Software Engineering Conference, Malaysia, pp. 325–330 (2014)Google Scholar
  25. 25.
    Townsend, K.R., Sun, S., Johson, T., Attia, O.G., Jones, P.H., Zambreno, J.: k-NN text classification using an FPGA-based sparse matrix vector multiplication accelerator. In: IEEE International Conference on Electro/Information Technology, pp. 257–263 (2015)Google Scholar
  26. 26.
    Gharehchopogh, F.S., Khaze, S.R., Maleki, I.: A new approach in bloggers classification with hybrid of k-nearest neighbor and artificial neural network algorithms. Indian J. Sci. Technol. 8, 237–246 (2015)CrossRefGoogle Scholar
  27. 27.
    Dey, L., Chakraborty, S., Biswas, A., Bose, B., Tiwari, S.: Sentiment analysis of review datasets using Naïve Bayes’ and k- NN classifiers. J. Informat. Eng. Electron. Business. 4, 54–62 (2016)Google Scholar
  28. 28.
    Narayanan, V., Arora, I., Bhatia, A.: Fast and accurate sentiment classification using an enhanced Naïve Bayes model. In: Intelligent Data Engineering and Automated Learning. 14th International Conference Proceedings, Springer, Berlin Heidelberg, vol. 8206, pp. 194–201 (2013)Google Scholar
  29. 29.
    Nikam, S.S.: A comparative study of classification techniques in data mining algorithms. Comput. Sci. Technol. 8, 13–19 (2015)Google Scholar
  30. 30.
    Amarappa, S., Sathyanarayana, S.V.: Data classification using support vector machine (SVM), a simplified approach. J. Electron. Comput. Sci. Engineering. 3, 435–445 (2014)Google Scholar
  31. 31.
    Sewaiwar, P., Verma, K.K.: Comparative study of various decision tree classification algorithm using WEKA. J. Emerging Res. Manag. Technol. 4, 87–91 (2015)Google Scholar
  32. 32.
    Teli, S., Kanikar, P.: A survey on decision tree based approaches in data mining. J. Advanc. Res. Comput. Sci. Soft. Eng. 5, 613–617 (2015)Google Scholar
  33. 33.
    Adamatti, D.F., Silveira, J.A., Carvalho, F.A.H.: Analyzing brain signals using decision trees: an approach based on neuroscience. Revista Eletronica Argentina-Brasil de Technologies da informacao e da Communicacao. 1, 5 (2016)Google Scholar
  34. 34.
    Santra, A.K., Christy, C.J.: Genetic algorithm and confusion matrix for document clustering. Int. J. Comput. Sci. Iss. 9, 322–328 (2012)Google Scholar
  35. 35.
    Yang, J., Qu, Z., Liu, Z.: Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization. Scientific World J. 1–17 (2014)Google Scholar
  36. 36.
    Hossin, M., Sulaiman, M.N.: A review on evaluation metrics for data classification evaluations. Int. J. Data Mining Knowledge Manag. Process. 5, 1–11 (2015)Google Scholar
  37. 37.
    Adeleke, A.O., Samsudin, N.A., Mustapha, A., Nawi, N.M.: Comparative analysis of text classification algorithms for automated labelling of Quranic verses. Int. J. Advanc. Sci. Eng. Info. Tech. 7, 1419–1427 (2017)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Abdullahi O. Adeleke
    • 1
  • Noor Azah Samsudin
    • 1
  • Aida Mustapha
    • 1
  • Nazri Mohd Nawi
    • 1
  1. 1.Faculty of Computer Science and Information TechnologyUniversiti Tun Hussein Onn MalaysiaParit Raja, Batu PahatMalaysia

Personalised recommendations