Skip to main content

MapReduce mRMR: Random Forests-Based Email Spam Classification in Distributed Environment

  • Conference paper
  • First Online:
Data Management, Analytics and Innovation

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1042))

Abstract

The furthermost standard message transfer system used on the internet for communication is email. These days spam is a serious concern that causes major problems in today’s internet. Spam emails are uninhibited messages that are sent to a large number of beneficiaries arbitrarily. Owing to an overgrowing rise in reputation, the number of unsolicited data has also increased promptly and has led to many security concerns. Although the sufficient number of spam filtering techniques exists, nowadays spammers start discovering innovative practices to escape data that are filtered using the spam filters. Spammers use this communication source for spreading the malware in the name of an executable file. These spam emails waste user’s system memory, computing power, and bandwidth of the network. Spam emails have been initiated to progressively damage the integrity of email and destroy the online experience. The research revealed that if the classification algorithms are used with feature selection then that will return the exact results than the standard classification. In this paper, feature selection is done through minimum redundancy and maximum relevance (mRMR) and the classification is done by means of Random Forests in the MapReduce environment. The performance is compared using various measures, namely sensitivity, correctness, and accuracy with the Random Forests in the distributed environment using Spambase dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zhang, Y., He, J., Xu, J.: A new anti-spam model based on e-mail address concealment technique. J. Nat. Sci. 23(1), 79–83 (2018)

    Google Scholar 

  2. Khalaf, O.I., Abdulsahib, G.M., Salman, A.D.: Handling dimensionality reduction in spam e-mail classification. J. Adv. Res. Dyn. Control Syst. 10(1), 691–697 (2018)

    Google Scholar 

  3. Bassiouni, M., Ali, M., El-Dahshan, E.A.: Ham and spam e-mails classification using machine learning techniques. J. Appl. Secur. Res. 13(3), 315–331 (2018)

    Article  Google Scholar 

  4. Kaur, J., Priyanka: Feature selection based efficient machine learning technique for email spam predicition. Int. J. Eng. Appl. Sci. Technol. 2(12), 13–19 (2018)

    Google Scholar 

  5. Vijayasekaran, G., Rosi, S.: Spam and email detection in big data platform using naives bayesian classifier. Int. J. Comput. Sci. Mob. Comput. (IJCSMC) 7(4), 53–58 (2018)

    Google Scholar 

  6. Radovic, M., Ghalwash, M., Filipovic, N., Obradovic, Z.: Minimum redundancy maximum relevance feature selection approach for temporal gene expression data. BMC Bioinform. 18(8), 1–14 (2017)

    Google Scholar 

  7. Easwaramoorthy, S., Thamburasa, S., Aravind, K., Bhushan, S.B., Rajadurai, H.: Heterogeneous classifier model for e-mail spam classification using FSO feature selection method. In: International Conference on Inventive Computation Technologies (ICICT), pp. 1–6 (2017)

    Google Scholar 

  8. Awad, M., Foqaha, M.: Email spam classification using hybrid approach of Rbfneural network and particle swarm optimization. Int. J. Netw. Secur. Appl. (IJNSA) 8(4), 1–12 (2016)

    Google Scholar 

  9. Sri Vinitha, V., Karthika Renuka, D., Bharathi, A.: E-mail spam classification using machine learning in distributed environment. J. Comput. Theor. Nanosci. 15(5), 1688–1694 (2018)

    Article  Google Scholar 

  10. Nesi, P., Pantaleo, G., Sanesi, G.: A hadoop based platform for natural language processing of web pages and documents. J. Vis. Lang. Comput. 31, 130–138 (2015)

    Article  Google Scholar 

  11. Ramirez-Gallego, S., Lastra, I., Martinez-Rego, D., Bolon-Canedo, V., Benitez, J.M., Herrera, F., Alonso-Betanzos, A.: Fast-mRMR: fast minimum redundancy maximum relevance algorithm for high-dimensional big data. Int. J. Intell. Syst. 00, 1–19 (2016)

    Google Scholar 

  12. Ozarkar, P., Patwardhan, M.: Efficient spam classification by appropriate feature selection. Glob. J. Comput. Sci. Technol. Softw. Data Eng. 13(5), 49–57 (2013)

    Google Scholar 

  13. Vaishnavi, N., Thiyagarajan, K.: A study on prediction of malicious program using classification based approches. Int. J. Comput. Sci. Mob. Comput. IJCSMC 7(5), 38–46 (2018)

    Google Scholar 

Download references

Acknowledgements

The authors sincerely thank the University Grants Commission (UGC), Hyderabad for granting funds to carry out this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. Sri Vinitha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sri Vinitha, V., Karthika Renuka, D. (2020). MapReduce mRMR: Random Forests-Based Email Spam Classification in Distributed Environment. In: Sharma, N., Chakrabarti, A., Balas, V. (eds) Data Management, Analytics and Innovation. Advances in Intelligent Systems and Computing, vol 1042. Springer, Singapore. https://doi.org/10.1007/978-981-32-9949-8_18

Download citation

  • DOI: https://doi.org/10.1007/978-981-32-9949-8_18

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-32-9948-1

  • Online ISBN: 978-981-32-9949-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics