Advertisement

Email Categorization with Tournament Methods

  • Yunqing Xia
  • Wei Liu
  • Louise Guthrie
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3513)

Abstract

To perform the task of email categorization, the tournament methods are proposed in this article in which the multi-class categorization process is broken down into a set of binary classification tasks. The methods of elimination tournament and Round Robin tournament are implemented and applied to classify emails within 15 folders. Substantial experiments are conducted to compare the effectiveness and robustness of the tournament methods against the n-way classification method. The experimental results prove that the tournament methods outperform the n-way method by 11.7% regarding precision, and the Round Robin performs slightly better than the Elimination tournament on average.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Xia, Y., Dalli, A., Wilks, Y., Guthrie, L.: FASiL Adaptive Email Categorization System. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 718–729. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  2. 2.
    Guthrie, L., Walker, E., Guthrie, J.: Document Classification by machine: Theory and practice. In: Proc. COLING 1994, pp. 1059–1063 (1994)Google Scholar
  3. 3.
    Smadja, F., Tumblin, H.: Automatic Spam Detection as a Text Classification Task. Elron Software (2003) Google Scholar
  4. 4.
    Lewis, D.: Naive Bayes at forty: The independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  5. 5.
    McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: AAAI 1998 Workshop on Text Categorization (1998)Google Scholar
  6. 6.
    Androutsopoulos, K.I., Chandrinos, J., Paliouras, G.K.V., Spyropoulos, C.D.: An Evaluation of Naive Bayesian Anti-Spam Filtering. In: Proc. of the workshop on Machine Learning in the New Information Age (2000)Google Scholar
  7. 7.
    Carrerras, X., Marquez, L.: Boosting Trees for Anti-Spam Email Filtering. In: Proc. RANLP-2001 (2001)Google Scholar
  8. 8.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)Google Scholar
  9. 9.
    Thorsten, J.: A Statistical Learning Model of Text Classification with Support Vector Machines. In: Proc. of SIGIR 2001, New Orleans, ACM Press, New York (2001)Google Scholar
  10. 10.
    Wiener, Pederson, E.J.O., Weigend, A.S.: A neural network approach to topic spotting. In: Proc. SDAIR 1995, Nevada, Las Vegas, pp. 317–332 (1995)Google Scholar
  11. 11.
    Yang, Y.: An evaluation of statistical approaches to text categorization. Journal of Information Retrieval 1(1/2), 67–88 (1999)Google Scholar
  12. 12.
    Breiman, B.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)zbMATHMathSciNetGoogle Scholar
  13. 13.
    Freund, Y., Schapire, R.E.: Experiments with a New Boosting Algorithm. In: Proceedings of the 13th International Conference on Machine Learning, pp. 325–332 (1996)Google Scholar
  14. 14.
    Cohen, W.: Learning Rules that Classify EMail. In: Proc. AAAI Spring Symposium on Machine Learning in Information Access, Stanford, California (1996)Google Scholar
  15. 15.
    Payne, T., Edwards, P.: Interface Agents that Learn: An Investigation of Learning Issues in a Mail Agent Interface. Applied Artificial Intelligence Journal, AUCS/TR9508 (1997)Google Scholar
  16. 16.
    Aas, L., Eikvil, L.: Text categorisation: A survey. Norwegian Computing Center, Raport NR 941 (1999)Google Scholar
  17. 17.
    Fürnkranz, J.: Round Robin Classification. Journal of Machine Learning Research 2, 21–747 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Yunqing Xia
    • 1
  • Wei Liu
    • 2
  • Louise Guthrie
    • 2
  1. 1.Department of Systems Engineering and Engineering ManagementThe Chinese University of Hong KongHong Kong
  2. 2.NLP Research Group, Department of Computer ScienceUniversity of Sheffield, Regent courtSheffield

Personalised recommendations