Clustering Ensemble for Spam Filtering

Porras, Santiago; Baruque, Bruno; Vaquerizo, Belén; Corchado, Emilio

doi:10.1007/978-3-642-21222-2_44

Santiago Porras²¹,
Bruno Baruque²¹,
Belén Vaquerizo²¹ &
…
Emilio Corchado²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6679))

Included in the following conference series:

International Conference on Hybrid Artificial Intelligence Systems

1409 Accesses
1 Citations

Abstract

One of the main problems that modern e-mail systems face is the management of the high degree of spam or junk mail they recieve. Those systems are expected to be able to distinguish between legitimate mail and spam; in order to present the final user as much interesting information as possible. This study presents a novel hybrid intelligent system using both unsupervised and supervised learning that can be easily adapted to be used in an individual or collaborative system. The system divides the spam filtering problem into two stages: firstly it divides the input data space into different similar parts. Then it generates several simple classifiers that are used to classify correctly messages that are contained in one of the parts previously determined. That way the efficiency of each classifier increases, as they can specialize in separate the spam from certain types of related messages. The hybrid system presented has been tested with a real e-mail data base and a comparison of its results with those obtained from other common classification methods is also included. This novel hybrid technique proves to be effective in the problem under study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ruta, D., Gabrys, B.: An overview of classifier fusion methods. Computing and Information Systems 7(1), 1–10 (2000)
Google Scholar
Schapire, R.E.: The strength of weak learnability. Machine Learning 5(2), 197–227 (1990)
Google Scholar
Baruque, B., Corchado, E.: A weighted voting summarization of SOM ensembles. Data Mining andKnowledge Discovery 21, 398–426 (2010), doi:10.1007/s10618-009-0160-3
Article MathSciNet Google Scholar
Corchado, E., Baruque, B.: Wevos-visom: An ensemble summarization algorithm for enhanced data visualization. Neurocomputing ( in press, 2011)
Google Scholar
Sharkey, A., Sharkey, N.: Combining diverse neural nets. Knowledge Engineering Review 12(3), 1–17 (1997)
Article Google Scholar
Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley Interscience, Hoboken (2004)
Book MATH Google Scholar
Jacobs, R., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Computation 3, 79–87 (1991)
Article Google Scholar
Polikar, R.: Ensemble based systems in decision making. IEEE Circuits and Systems Magazine 6(3), 21–45 (2006)
Article Google Scholar
Kohonen, T.: Self-Organizing Maps, vol. 30. Springer, Berlin (1995)
MATH Google Scholar
Lampinen, J., Oja, E.: Clustering properties of hierarchical self-organizing maps. Journal of Mathematical Imaging and Vision 2, 261–272 (1992)
Article MATH Google Scholar
Dara, R., Kremer, S.C., Stacey, D.A.: Clustering unlabelled data with SOMs improves classi cation of labelled real-world data. In: Proc. IEEE World Congress, on Computational Intelligence, pp. 2237–2242 ( May 2002)
Google Scholar
Ultsch, A.: Self-organizing neural networks for visualization and classification. In: Proc. Conf. Soc. for Information and Classification (1992)
Google Scholar
Ultsch, A.: U*-matrix: A tool to visualize clusters in high dimensional data. Tech. rep., Department of Computer Science, University of Marburg (2003)
Google Scholar
Kuncheva, L.I.: Clustering-and-selection model for classifier combination. In: KES, pp. 185–188 (2000)
Google Scholar
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is nearest neighbor meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)
Chapter Google Scholar
Breiman, L.: Bagging predictors. In: Machine Learning, vol. 24(2), pp. 123–140 (1996)
Google Scholar
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: International Conference on Machine Learning, vol. 156, p. 148 (1996)
Google Scholar
Asuncion, A., Newman, D.J.: UCI machine learning repository (2007)
Google Scholar
Apache Software Foundation. Spamassasin public corpus (2006)
Google Scholar
Singhal, A.: Modern information retrieval: A brief overview. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 24(4), 35–43 (2001)
Google Scholar
Maron, M.E.: An historical note on the origins of probabilistic indexing. Information Processing and Management 44, 971–972 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Civil Engineering Department, University of Burgos, Spain
Santiago Porras, Bruno Baruque & Belén Vaquerizo
Departamento de Informática y Automática, Universidad de Salamanca, Spain
Emilio Corchado

Authors

Santiago Porras
View author publications
You can also search for this author in PubMed Google Scholar
Bruno Baruque
View author publications
You can also search for this author in PubMed Google Scholar
Belén Vaquerizo
View author publications
You can also search for this author in PubMed Google Scholar
Emilio Corchado
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

GICAP Research Group, University of Burgos, 09006, Burgos, Spain
Emilio Corchado
Wroclaw University of Technology, 50-370, Wroclaw, Poland
Marek Kurzyński & Michał Woźniak &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Porras, S., Baruque, B., Vaquerizo, B., Corchado, E. (2011). Clustering Ensemble for Spam Filtering. In: Corchado, E., Kurzyński, M., Woźniak, M. (eds) Hybrid Artificial Intelligent Systems. HAIS 2011. Lecture Notes in Computer Science(), vol 6679. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21222-2_44

Download citation

DOI: https://doi.org/10.1007/978-3-642-21222-2_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21221-5
Online ISBN: 978-3-642-21222-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics