Skip to main content

Comparing Supervised Learning Classifiers to Detect Advanced Fee Fraud Activities on Internet

  • Conference paper

Abstract

Due to its inherent vulnerability, internet is frequently abused for various criminal activities such as Advanced Fee Fraud (AFF). At present, it is difficult to accurately detect activities of AFF defrauders on internet. For this purpose, we compare classification accuracies of Binary Logistic Regression (BLR), Back-propagation Neural Network (BNN), Naive Bayesian Classifier (NBC) and Support Vector Machine (SVM) learning methods. The word clustering method (globalCM) is used to create clusters of words present in the training dataset. A Vector Space Model (VSM) is calculated from words in each e-mail in the training set. The WEKA data mining framework is selected as a tool to build supervised learning classifiers from the set of VSMs using the learning methods. Experiments are performed using stratified 10-fold cross-validation method to estimate classification accuracies of the classifiers. Results generally show that SVM utilizing a polynomial kernel gives the best classification accuracy. This study makes a positive contribution to the problem of detecting unwanted e-mails. The comparison of different learning methods is also valuable for a decision maker to consider tradeoffs in method accuracy versus complexity.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Grobier, M.: Strategic information security: facing the cyber impact. In: Proceedings of the Workshop on ICT uses in Warfare and Safeguarding of Peace, pp. 12–22. SAICSIT (2010)

    Google Scholar 

  2. Internet Crime Complaint Center (IC3). An FBI–NW3C partnership, http://www.ic3.gov/media/annualreports.aspx (accessed July 2011)

  3. UAGI. Ultrascan 419unit-419 Advance Fee Fraud Statistics, http://www.ultrascanagi.com/public_html/html/pdf_files/419_Advance_Fee_Fraud_Statistics_2009.pdf

  4. Marcus, K.R., Seigfried, K.: The future of computer forensics:a needs analysis survey. Computer & Security 23(1), 12–16 (2004)

    Article  Google Scholar 

  5. Ciardhuáin, O.S.: An extended model of cybercrime investigations. International Journal of Digital Evidence 3(1) (2004)

    Google Scholar 

  6. Chandrasekaran, M., Narayanan, K., Upadhyaya, K.S.: Phishing email detection based on structural properties. In: First Annual Symposium on Information Assurance: Intrusion Detection and Prevention, New York, pp. 2–8 (2006)

    Google Scholar 

  7. Abu-Nimeh, S., Nappa, D., Wang, X., Nair, S.: A comparison of machine learning techniques for phishing detection. In: Proceedings of the Anti-Phishing Working Groups (APWG), Second Annual eCrime Researchers Summit, Pittsburgh, PA, US, pp. 1–10 (2007)

    Google Scholar 

  8. Fette, I., Sadeh, N., Tomasic, A.: Learning to detect phishing emails. In: Proceedings of the 16th International Conference on World Wide Web, pp. 649–656. ACM Press, New York (2007)

    Chapter  Google Scholar 

  9. Airoldi, E., Malin, B.: Data mining challenges for electronic safety: the case of fraudulent intent detection in emails. In: Proceedings of the Workshop on Privacy and Security Aspects of Data Mining, IEEE International Conference on Data Mining, Brighton, England, pp. 1–10 (2004)

    Google Scholar 

  10. Hadjidj, R., Debbabi, M., Lounis, H., Iqbal, F.: Towards an Integrated Email Forensic Analysis Framework. Digital Investigation 5, 124–137 (2009)

    Article  Google Scholar 

  11. Modupe, A., Olugbara, O.O., Ojo, S.O.: Identifying advanced fee fraud activities on the internet using machine learning algorithms. In: 3rd IEEE International Conference on Computational Intelligence and Industrial Application (PACIIA), Wuhan, China, pp. 240–242 (2010)

    Google Scholar 

  12. Wenliang, C., Xingzhi, C., Huizhen, W., Jingbo, Z., Tianshun, Y.: Automatic word clustering for text categorization using global information. In: AIRS, Beijing, China, pp. 1–6. ACM (2004)

    Google Scholar 

  13. Worth, A.P., Cronin, M.T.D.: The use of discriminant analysis, logistic regression and classification tree analysis in the development of classification models for human health effects. Journal of Molecular Structure 622, 97–111 (2003)

    Article  Google Scholar 

  14. Khan, A., Baharudin, B., Lee, L.H., Khan, K.: A review of machine learning algorithms for text documents classification. Journal of Advanced in Information Technology 1(1), 4–20 (2010)

    Google Scholar 

  15. Byvatov, E., Fechner, U., Sadowski, J., Schneider, G.: Comparison of support vector machine and artificial neural network systems for drug/nondrug classification. J. Chem. Inf. Comput. Sci. 43, 1882–1889 (2003)

    Article  Google Scholar 

  16. Yu, B., Xu, Z., Li, C.: Latent semantic analysis for text categorization using neural network. Knowledge-Based Systems 24, 900–904 (2008)

    Article  Google Scholar 

  17. Bishop, C.M.: Neural networks for pattern recognition. Oxford University Press (1995)

    Google Scholar 

  18. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification. Wiley-Interscience, New York (2000)

    MATH  Google Scholar 

  19. Cortes, C., Vapnik, V.: Support vector networks in machine learning, vol. 20, pp. 273–297 (1995)

    Google Scholar 

  20. Rios, G., Zhu, H.: Exploring support vector machines and random forests for spam detection. In: Proceedings of CEAS 2004 (2004)

    Google Scholar 

  21. Mitra, V., Wang, C., Banerjee, S.: Text classification: a least square support vector machine approach. Applied Soft Computing 7, 908–914 (2007)

    Article  Google Scholar 

  22. Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Article  Google Scholar 

  23. Kurz, T., Stoffel, K.: Going beyond stemming: creating concept signatures of complex medical terms. Knowledge Based Systems 15, 309–313 (2002)

    Article  Google Scholar 

  24. Klimt, B., Yang, Y.: The Enron Corpus: A New Dataset for Email Classification Research. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 217–226. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  25. Salton, G., Yang, C., Wang, A.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)

    Google Scholar 

  26. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software. SIGKDD Explorations 11(1) (2009)

    Google Scholar 

  27. Wang, T., Chiang, H.: Fuzzy support vector machine for multi-class text categorization. Information Process and Management 43, 914–929 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering

About this paper

Cite this paper

Modupe, A., Olugbara, O.O., Ojo, S.O. (2012). Comparing Supervised Learning Classifiers to Detect Advanced Fee Fraud Activities on Internet. In: Meghanathan, N., Chaki, N., Nagamalai, D. (eds) Advances in Computer Science and Information Technology. Computer Science and Information Technology. CCSIT 2012. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 86. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27317-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-27317-9_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-27316-2

  • Online ISBN: 978-3-642-27317-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics