Abstract
The furthermost standard message transfer system used on the internet for communication is email. These days spam is a serious concern that causes major problems in today’s internet. Spam emails are uninhibited messages that are sent to a large number of beneficiaries arbitrarily. Owing to an overgrowing rise in reputation, the number of unsolicited data has also increased promptly and has led to many security concerns. Although the sufficient number of spam filtering techniques exists, nowadays spammers start discovering innovative practices to escape data that are filtered using the spam filters. Spammers use this communication source for spreading the malware in the name of an executable file. These spam emails waste user’s system memory, computing power, and bandwidth of the network. Spam emails have been initiated to progressively damage the integrity of email and destroy the online experience. The research revealed that if the classification algorithms are used with feature selection then that will return the exact results than the standard classification. In this paper, feature selection is done through minimum redundancy and maximum relevance (mRMR) and the classification is done by means of Random Forests in the MapReduce environment. The performance is compared using various measures, namely sensitivity, correctness, and accuracy with the Random Forests in the distributed environment using Spambase dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zhang, Y., He, J., Xu, J.: A new anti-spam model based on e-mail address concealment technique. J. Nat. Sci. 23(1), 79–83 (2018)
Khalaf, O.I., Abdulsahib, G.M., Salman, A.D.: Handling dimensionality reduction in spam e-mail classification. J. Adv. Res. Dyn. Control Syst. 10(1), 691–697 (2018)
Bassiouni, M., Ali, M., El-Dahshan, E.A.: Ham and spam e-mails classification using machine learning techniques. J. Appl. Secur. Res. 13(3), 315–331 (2018)
Kaur, J., Priyanka: Feature selection based efficient machine learning technique for email spam predicition. Int. J. Eng. Appl. Sci. Technol. 2(12), 13–19 (2018)
Vijayasekaran, G., Rosi, S.: Spam and email detection in big data platform using naives bayesian classifier. Int. J. Comput. Sci. Mob. Comput. (IJCSMC) 7(4), 53–58 (2018)
Radovic, M., Ghalwash, M., Filipovic, N., Obradovic, Z.: Minimum redundancy maximum relevance feature selection approach for temporal gene expression data. BMC Bioinform. 18(8), 1–14 (2017)
Easwaramoorthy, S., Thamburasa, S., Aravind, K., Bhushan, S.B., Rajadurai, H.: Heterogeneous classifier model for e-mail spam classification using FSO feature selection method. In: International Conference on Inventive Computation Technologies (ICICT), pp. 1–6 (2017)
Awad, M., Foqaha, M.: Email spam classification using hybrid approach of Rbfneural network and particle swarm optimization. Int. J. Netw. Secur. Appl. (IJNSA) 8(4), 1–12 (2016)
Sri Vinitha, V., Karthika Renuka, D., Bharathi, A.: E-mail spam classification using machine learning in distributed environment. J. Comput. Theor. Nanosci. 15(5), 1688–1694 (2018)
Nesi, P., Pantaleo, G., Sanesi, G.: A hadoop based platform for natural language processing of web pages and documents. J. Vis. Lang. Comput. 31, 130–138 (2015)
Ramirez-Gallego, S., Lastra, I., Martinez-Rego, D., Bolon-Canedo, V., Benitez, J.M., Herrera, F., Alonso-Betanzos, A.: Fast-mRMR: fast minimum redundancy maximum relevance algorithm for high-dimensional big data. Int. J. Intell. Syst. 00, 1–19 (2016)
Ozarkar, P., Patwardhan, M.: Efficient spam classification by appropriate feature selection. Glob. J. Comput. Sci. Technol. Softw. Data Eng. 13(5), 49–57 (2013)
Vaishnavi, N., Thiyagarajan, K.: A study on prediction of malicious program using classification based approches. Int. J. Comput. Sci. Mob. Comput. IJCSMC 7(5), 38–46 (2018)
Acknowledgements
The authors sincerely thank the University Grants Commission (UGC), Hyderabad for granting funds to carry out this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Sri Vinitha, V., Karthika Renuka, D. (2020). MapReduce mRMR: Random Forests-Based Email Spam Classification in Distributed Environment. In: Sharma, N., Chakrabarti, A., Balas, V. (eds) Data Management, Analytics and Innovation. Advances in Intelligent Systems and Computing, vol 1042. Springer, Singapore. https://doi.org/10.1007/978-981-32-9949-8_18
Download citation
DOI: https://doi.org/10.1007/978-981-32-9949-8_18
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-32-9948-1
Online ISBN: 978-981-32-9949-8
eBook Packages: EngineeringEngineering (R0)