Text Classification with K-Nearest Neighbors Algorithm Using Gain Ratio

  • Manjari Singh Rathore
  • Praneet Saurabh
  • Ritu Prasad
  • Pradeep Mewada
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1119)


Content classification is the errand of naturally arranging a lot of records into classifications from a predefined set. This implies that it allocates predefined classifications to free-content archives. This paper introduces a unique two-phase determination strategy for content classification using data gain (CCDG) that will guide and examine the hereditary calculation of the given dataset. In the first phase of CCDG, each term inside the archive is positioned depending on its significance for grouping and data gain. In the second stage, hereditary calculation through GA and main segment investigation through PCA determines and highlights the relevant extraction of the trend of the given stream of bits in decreasing impact. In this manner, all the content that has lesser significance can be overlooked while only impactful content remains for providing details. Experiments show encouraging and better results for proposed CCDG as compared to conventional methods under all the dataset and test conditions.


Text Classification Feature selection PCA 


  1. 1.
    Ertugrul, Ö.F., Tagluk, M.E.: A novel version of k nearest neighbor: dependent nearest neighbor. Appl. Soft Comput. 55, 480–490 (2017)CrossRefGoogle Scholar
  2. 2.
    Singh, A., Deep, K., Grover, P.: A novel approach to accelerate calibration process of a k-nearest neighbor classifier using GPU. J. Parallel Distrib. Comput. 104, 114–129 (2017)CrossRefGoogle Scholar
  3. 3.
    Parvin, H., Alizadeh, H., Minati, B.: A modification on k-nearest neighbor classifier. Glob. J. Comput. Sci. Technol. (2010)Google Scholar
  4. 4.
    Keller, J.M., Gray, M.R., Givens, J.A.: A fuzzy k-nearest neighbor algorithm. IEEE Trans. Syst. Man. Cybern. 4, 580–585 (1985)CrossRefGoogle Scholar
  5. 5.
    Faziludeen, S., Sankaran, P.: ECG beat classification using evidential K-nearest neighbours. Proc. Comput. Sci. 89, 499–505 (2016)CrossRefGoogle Scholar
  6. 6.
    Song, Y., Liang, J., Lu, J., Zhao, X.: An efficient instance selection algorithm for k nearest neighbor regression. Neurocomputing 251, 26–34 (2017)CrossRefGoogle Scholar
  7. 7.
    Bian W.: Fuzzy-rough nearest-neighbor classification method: an integrated framework. In: Proceedings of the IASTED internet. conference on applied informatics, pp. 160–164, Austria (2002)Google Scholar
  8. 8.
    Nguyen, B., Morell, C., De Baets, B.: Large-scale distance metric learning for k-nearest neighbors regression. Neurocomputing 214, 805–814 (2016)CrossRefGoogle Scholar
  9. 9.
    Lin, Y., Li, J., Lin, M., Chen, J.: A new nearest neighbor classifier via fusing neighborhood information. Neurocomputing 143, 164–169 (2014)CrossRefGoogle Scholar
  10. 10.
    Manocha, S., Girolami, M.A.: An empirical analysis of the probabilistic k-nearest neighbour classifier. Pattern Recogn. Lett. 28(13), 1818–1824 (2007)CrossRefGoogle Scholar
  11. 11.
    Sarkar, M.: Fuzzy-rough nearest neighbor algorithms in classification. Fuzzy Sets Syst. 158(19), 2134–2152 (2007)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Timofte, R., Van Gool, L.: Iterative nearest neighbors. Pattern Recogn. 48(1), 60–72 (2015)CrossRefGoogle Scholar
  13. 13.
    Roweis S.T., Saul L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)Google Scholar
  14. 14.
    Aharon, M., Elad, M., Bruckstein, A.: SVD: an algorithm for designing over complete dictionaries for sparse representation. IEEE Trans. Signal Process. 54(11), 4311–4322 (2006)CrossRefGoogle Scholar
  15. 15.
    Sahu S., Saurabh P., Rai S.: An enhancement in clustering for sequential pattern mining through neural algorithm using web logs. In: International conference on computational intelligence and communication networks, pp. 758–764 (2015)Google Scholar
  16. 16.
    Saxena, M., Saurabh, P., Verma, B.: A new hashing scheme to overcome the problem of overloading of articles in Usenet, pp. 967–975. Springer, AISC (2012)Google Scholar
  17. 17.
    Mishra, B.K., Saurabh, P., Verma, B.: A novel approach to classify high dimensional datasets using supervised manifold learning, pp. 22–30. Springer, CCIS (2012)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  • Manjari Singh Rathore
    • 1
  • Praneet Saurabh
    • 2
  • Ritu Prasad
    • 1
  • Pradeep Mewada
    • 1
  1. 1.Technocrats Institute of Technology AdvanceBhopalIndia
  2. 2.Mody University of Science and TechnologyLakshmangarhIndia

Personalised recommendations