Skip to main content

The Decomposed K-Nearest Neighbor Algorithm for Imbalanced Text Classification

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7709))

Abstract

As textual data have exponentially increased, it is focused that a need for automatic classification of relevant data to one of pre-defined classes. In many practical applications, they assume that training data are evenly distributed among all classes, but they are suffered from an imbalanced problem. Several algorithms and re-sampling methods have been proposed to overcome an imbalanced problem, but they are still facing the overfitting and information missing. This paper proposes the Decomposed K-Nearest Neighbor (DCM-KNN). In training step, the DCM-KNN decomposes training data into misclassified and correctly-classified data set based on the result of traditional KNN, and finds the appropriate KNN for each set. In test step, the DCM-KNN estimates whether test data is similar to misclassified and correctly-classified data set, and applies the appropriate KNNs. Experimental results show that proposed algorithm can achieve more accurate results in an imbalanced condition.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aci, M., Inan, C., Avci, M.: A hybrid classification method of K-Nearest Neighbor, Bayesian methods and genetic algorithm. Expert Systems with Applications 37(7), 5061–5067 (2010)

    Article  Google Scholar 

  2. Androutsopoulos, I., Koutsias, J., Chandrinos, K., Paliouras, G., Spyropoulos, C.: An experimental comparison of Naïve Bayesian and keyword-based anti-spam filtering with personal e-mail messages. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 160–167 (2000)

    Google Scholar 

  3. Baoli, L., Qin, L., Shiwen, Y.: An adaptive K-Nearest Neighbor text categorization strategy. ACM Transactions on Asian Language Information Processing 3(4), 215–226 (2004)

    Article  Google Scholar 

  4. Chen, E., Lin, Y., Xiong, H., Luo, Q., Ma, H.: Exploiting probabilistic topic models to improve text categorization under class imbalance. Information Processing & Management 47(2), 202–214 (2011)

    Article  Google Scholar 

  5. Chen, J.N., Huang, H.K., Tian, S.F., Qu, Y.L.: Feature selection for text classification with Naïve Bayes. Expert Systems with Applications 36(3), 5432–5435 (2009)

    Article  Google Scholar 

  6. Carpinter, J., Hunt, R.: Tightening the net: A review of current and next generation spam filtering tools. Computers & Security 25(8), 566–578 (2006)

    Article  Google Scholar 

  7. Japkowicz, N.: The class imbalance problem: Significance and strategies. In: Proceedings of the International Conference on Artificial Intelligence (IC-AI), pp. 111–117 (2000)

    Google Scholar 

  8. Japkowicz, N., Stephen, S.: The class imbalance problem: A systematic study. Intelligent Data Analysis 6(5), 429–449 (2002)

    MATH  Google Scholar 

  9. Jiang, S., Pang, G., Wu, M., Kuang, L.: An improved K-Nearest Neighbor algorithm for text categorization. Expert Systems with Applications 39(1), 1503–1509 (2012)

    Article  Google Scholar 

  10. Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  11. Joachims, T.: Learning to classify text using support vector machines. Kluwer Academic Publishers, Norwell (2002)

    Book  Google Scholar 

  12. Lee, L.H., Isa, D., Choo, W.O., Chue, W.Y.: High relevance keyword extraction facility for Bayesian text classification on different domains of varying characteristic. Expert Systems with Applications 39(1), 1147–1155 (2012)

    Article  Google Scholar 

  13. Lee, L.H., Wan, C.H., Rajkumar, R., Isa, D.: An enhanced support vector machine classification framework by using Euclidean distance function for text document categorization. Applied Intelligence 37(1), 80–99 (2012)

    Article  Google Scholar 

  14. Manne, S., Kotha, S.K., Fatima, S.S.: Text categorization with K-Nearest Neighbor approach. In: Proceedings of the International Conference on Information Systems Design and Intelligent Applications, vol. 132, pp. 413–420 (2012)

    Google Scholar 

  15. Shi, K., Li, L., Liu, H., He, J., Zhang, N., Song, W.: An improved KNN text classification algorithm based on density. In: IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS), pp. 113–117 (2011)

    Google Scholar 

  16. Tan, S.: Neighbor-weighted K-Nearest Neighbor for unbalanced text corpus. Expert Systems with Applications 28(4), 667–671 (2005)

    Article  Google Scholar 

  17. Tan, S.: An effective refinement strategy for K-Nearest Neighbor text classifier. Expert Systems with Applications 30(2), 290–298 (2006)

    Article  Google Scholar 

  18. Wan, C.H., Lee, H.L., Rajkurmar, R., Isa, D.: A hybrid text classification approach with low dependency on parameter by integrating K-Nearest Neighbor and support vector machine. Expert Systems with Applications 39(15), 11880–11888 (2012)

    Article  Google Scholar 

  19. Yang, Y.: An evaluation of statistical approaches to text categorization. Information Retrieval 1(1-2), 76–88 (1999)

    Google Scholar 

  20. Yang, Y., Ault, T., Peirce, T., Lattimer, C.W.: Improving text categorization methods for event tracking. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 65–72 (2000)

    Google Scholar 

  21. Ye, Q., Zhang, Z., Law, R.: Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. Expert Systems with Applications 36(3), 6527–6535 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kang, HS., Nam, K., Kim, Si. (2012). The Decomposed K-Nearest Neighbor Algorithm for Imbalanced Text Classification. In: Kim, Th., Lee, Yh., Fang, Wc. (eds) Future Generation Information Technology. FGIT 2012. Lecture Notes in Computer Science, vol 7709. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35585-1_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35585-1_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35584-4

  • Online ISBN: 978-3-642-35585-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics