Advertisement

Parallel Nearest Neighbour Algorithms for Text Categorization

  • Reynaldo Gil-García
  • José Manuel Badía-Contelles
  • Aurora Pons-Porrata
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4641)

Abstract

In this paper we describe the parallelization of two nearest neighbour classification algorithms. Nearest neighbour methods are well-known machine learning techniques. They have been successfully applied to Text Categorization task. Based on standard parallel techniques we propose two versions of each algorithm on message passing architectures. We also include experimental results on a cluster of personal computers using a large text collection. Our algorithms attempt to balance the load among the processors, they are portable, and obtain very good speedups and scalability.

Keywords

Parallel Algorithm Near Neighbour Parallel Version Neighbour Classifier Neighbour Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Drucker, H., Wu, D., Vapnik, V.N.: Support Vector Machines for Spam Categorization. IEEE Transactions on Neural Networks 10(5), 1048–1054 (1999)CrossRefGoogle Scholar
  2. 2.
    Eichmann, D., Srinivasan, P.: Adaptive Filtering of Newswire Stories using Two-Level Clustering. Information Retrieval 5, 209–237 (2002)CrossRefGoogle Scholar
  3. 3.
    Iyer, R.D., Lewis, D.D., Schapire, R.E., Singer, Y., Singhal, A.: Boosting for Document Routing. In: Proceedings of the Ninth International Conference on Information and Knowledge Management (2000)Google Scholar
  4. 4.
    Li, X.: Nearest neighbor classification on two types of SIMD machines. Parallel Computing 17, 381–407 (1991)zbMATHCrossRefGoogle Scholar
  5. 5.
    Creecy, R.H., Masand, B.M., Smith, S.J., Waltz, D.L.: Trading MIPS and memory for knowledge engineering. Comm. of the ACM 35(8), 48–63 (1992)CrossRefGoogle Scholar
  6. 6.
    Jin, R., Agrawal, G.: A Middleware for Developing Parallel Data Mining Implementations. In: Proceedings of the First SIAM Conference on Data Mining (2001)Google Scholar
  7. 7.
    Jin, R., Yang, G., Agrawal, G.: Shared memory parallelization of data mining algorithms: Techniques, programming interface and performance. IEEE Transactions on Knowledge and Data Engineering 17, 71–89 (2005)CrossRefGoogle Scholar
  8. 8.
    Lewis, D., Yang, Y., Rose, T., Li, F.: Rcv1: A new benchmark collection for text categorization research. Machine Learning Reseach 5, 361–397 (2004)Google Scholar
  9. 9.
    Gil-García, R., Pons-Porrata, A.: A new nearest neighbor rule for text categorization. In: Martínez-Trinidad, J.F., Carrasco Ochoa, J.A., Kittler, J. (eds.) CIARP 2006. LNCS, vol. 4225, pp. 814–823. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  10. 10.
    Yang, Y.: Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In: SIGIR 1994 17th ACM International Conference on Research and Development in Information Retrieval, Ireland, pp. 13–22. ACM Press, New York (1994)Google Scholar
  11. 11.
    Buckley, C., Salton, G., Allan, J.: The effect of adding relevance information in a relevance feedback environment. In: Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pp. 292–300. ACM Press, New York (1994)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Reynaldo Gil-García
    • 1
  • José Manuel Badía-Contelles
    • 2
  • Aurora Pons-Porrata
    • 1
  1. 1.Center of Pattern Recognition and Data Mining, Universidad de OrienteCuba
  2. 2.Dpt. Computer Science and Engineering, Universitat Jaume I, CastellónSpain

Personalised recommendations