Skip to main content

An Approach to Hierarchical Email Categorization Based on ME

  • Conference paper
  • 963 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4592))

Abstract

This paper proposes a hierarchical approach for categorizing emails with the ME (Maximum Entropy) model based on its contents and attributes. That approach categorizes emails in a two-phase way. First, it divides emails into two sets: legitimate set and Spam set; then it categorizes them in two different sets with different feature selection methods respectively. In addition, the pre-processing, the construction of features and the ME model suitable for the email categorization are also described in building the categorizer. Experimental results testify that our hierarchical approach is more efficient than existing approaches and the feature selection is an important factor that affects the precision of email categorization.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cohen, W.: Learning rules that classify e-mail. In: Proc. of AAAI Spring Symposium on Machine Learning and Information Retrieval, pp. 18–25 (1996)

    Google Scholar 

  2. Provost, J.: Naïve-bayes vs. rule-learning in classification of email. Technical Report AITR-99-284, University of Texas at Austin, Artificial Intelligence Lab (1999)

    Google Scholar 

  3. Li, Z., Wang, G., Wu, Y.: An E-mail classification system based on Rough Set. Computer Science 31(3), 58–60 (2004)

    MathSciNet  Google Scholar 

  4. Yang, J., Chalasani, V., Park, S.: Intelligent email categorization based on textual information and metadata. IEICE Transactions on Information and Systems, pp. 1280–1288 (2003)

    Google Scholar 

  5. Yang, J., Park, S.: Email categorization using fast machine learning algorithms. In: Proc. of the 5th Int. Conf. on Discovery Science, pp. 316–323 (2002)

    Google Scholar 

  6. Bekkerman, R., McCallum, A., Huang, G.: Automatic categorization of email into folders: benchmark experiments on Enron and SRI corpora. CIIR Technical Report IR418 (2004)

    Google Scholar 

  7. Zhu, Q., Zhou, Z., Li, P.: Design of the Chinese mail classifier based on Winnow. Acta Electronica Sinca 33(12A), 2481–2482 (2005)

    Google Scholar 

  8. Clark, J., Koprinska, I., Poon, J.: LINGER – a smart personal assistant for e-mail classification. In: Proc. Of the 13th Int. Conf. on Artificial Neural Networks, pp. 274–277 (2003)

    Google Scholar 

  9. Berger, A., Pietra, S., Pietra, V.A.: maximum entropy approach to natural language processing. Computational Linguistics 22(1), 38–73 (1996)

    Google Scholar 

  10. Zhang, L., Yao, T.: Filtering junk mail with a maximum entropy model. In: Proc. of 20th Int. Conf. on Computer Processing of Oriental Languages, pp. 446–453 (2003)

    Google Scholar 

  11. Li, R., Wang, J., Chen, X., et al.: Using Maximum Entropy model for Chinese text categorization. Journal of Computer Research and Development 42(1), 94–101 (2005)

    Article  Google Scholar 

  12. Klimt, B., Yang, Y.: The Enron Corpus: A new dataset for email classification research. In: Proc. of ECML 2004, 15th European Conf. on Machine Learning, pp. 217–226 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Zoubida Kedad Nadira Lammari Elisabeth Métais Farid Meziane Yacine Rezgui

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, P., Li, J., Zhu, Q. (2007). An Approach to Hierarchical Email Categorization Based on ME. In: Kedad, Z., Lammari, N., Métais, E., Meziane, F., Rezgui, Y. (eds) Natural Language Processing and Information Systems. NLDB 2007. Lecture Notes in Computer Science, vol 4592. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73351-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73351-5_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73350-8

  • Online ISBN: 978-3-540-73351-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics