Abstract
Many classification techniques used for identifying spam emails, treat spam filtering as a binary classification problem. That is, the incoming email is either spam or non-spam. This treatment is more for mathematical simplicity other than reflecting the true state of nature. In this paper, we introduce a three-way decision approach to spam filtering based on Bayesian decision theory, which provides a more sensible feedback to users for precautionary handling their incoming emails, thereby reduces the chances of misclassification. The main advantage of our approach is that it allows the possibility of rejection, i.e., of refusing to make a decision. The undecided cases must be re-examined by collecting additional information. A loss function is defined to state how costly each action is, a pair of threshold values on the posterior odds ratio is systematically calculated based on the loss function, and the final decision is to select the action for which the overall cost is minimum. Our experimental results show that the new approach reduces the error rate of classifying a legitimate email to spam, and provides better spam precision and weighted accuracy.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Spyropoulos, C.D.: An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 160–167 (2000)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence, pp. 1022–1029 (1993)
Forster, M.R.: Key concepts in model selection: performance and generalizability. Journal of Mathematical Psychology 44, 205–231 (2000)
Good, I.J.: The Estimation of Probabilities: An Essay on Modern Bayesian Methods. MIT Press, Cambridge (1965)
Goudey, R.: Do statistical inferences allowing three alternative decision give better feedback for environmentally precautionary decision-making. Journal of Environmental Management 85, 338–344 (2007)
Li, Y.F., Zhang, C.Q.: Rough set based decision model in information retrieval and filtering. In: Third World Multiconference on Systemics, Cybernetics and Informatics (SCI 1999) and Fifth International Conference on Information Systems Analysis and Synthesis (ISAS 1999), vol. 5, pp. 398–403 (1999)
Masand, B., Linoff, G., Waltz, D.: Classifying news stories using memory based reasoning. In: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 59–65 (1992)
Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)
Pauker, S.G., Kassirer, J.P.: The threshold approach to clinical decision making. New England Journal of Medicine (1980)
Pawlak, Z., Skowron, A.: Rough membership functions. In: Yager, R.R., Fedrizzi, M., Kacprzyk, J. (eds.) Advances in the Dempster-Shafer Theory of Evidence, pp. 251–271. John Wiley and Sons, New York (1994)
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian approach to filtering junk e-mail. In: AAAI Workshop on Learning for Text Categorization, Madison, Wisconsin. AAAI Technical Report WS-98-05 (1998)
Schapire, E., Singer, Y.: BoosTexter: A boosting-based system for text categorization. Machine Learning 39(2/3), 135–168 (2000)
Yao, Y.Y., Wong, S.K.M., Lingras, P.: A decision-theoretic rough set model. In: Ras, Z.W., Zemankova, M., Emrich, M.L. (eds.) Methodologies for Intelligent Systems 5, New York, pp. 17–24. North-Holland, Amsterdam (1990)
Yao, Y.Y.: Decision-theoretic rough set models. In: Yao, J., Lingras, P., Wu, W.-Z., Szczuka, M.S., Cercone, N.J., Ślȩzak, D. (eds.) RSKT 2007. LNCS (LNAI), vol. 4481, pp. 1–12. Springer, Heidelberg (2007)
Zhao, W.Q., Zhu, Y.L.: An email classification scheme based on decision-theoretic rough set theory and analysis of email security. In: Proceeding of 2005 IEEE Region 10 TENCON, pp. 1–6 (2005)
Ziarko, W.: Variable precision rough sets model. Journal of Computer and Systems Sciences 46, 39–59 (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhou, B., Yao, Y., Luo, J. (2010). A Three-Way Decision Approach to Email Spam Filtering. In: Farzindar, A., Kešelj, V. (eds) Advances in Artificial Intelligence. Canadian AI 2010. Lecture Notes in Computer Science(), vol 6085. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13059-5_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-13059-5_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13058-8
Online ISBN: 978-3-642-13059-5
eBook Packages: Computer ScienceComputer Science (R0)