Abstract
Due to its inherent vulnerability, internet is frequently abused for various criminal activities such as Advanced Fee Fraud (AFF). At present, it is difficult to accurately detect activities of AFF defrauders on internet. For this purpose, we compare classification accuracies of Binary Logistic Regression (BLR), Back-propagation Neural Network (BNN), Naive Bayesian Classifier (NBC) and Support Vector Machine (SVM) learning methods. The word clustering method (globalCM) is used to create clusters of words present in the training dataset. A Vector Space Model (VSM) is calculated from words in each e-mail in the training set. The WEKA data mining framework is selected as a tool to build supervised learning classifiers from the set of VSMs using the learning methods. Experiments are performed using stratified 10-fold cross-validation method to estimate classification accuracies of the classifiers. Results generally show that SVM utilizing a polynomial kernel gives the best classification accuracy. This study makes a positive contribution to the problem of detecting unwanted e-mails. The comparison of different learning methods is also valuable for a decision maker to consider tradeoffs in method accuracy versus complexity.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Grobier, M.: Strategic information security: facing the cyber impact. In: Proceedings of the Workshop on ICT uses in Warfare and Safeguarding of Peace, pp. 12–22. SAICSIT (2010)
Internet Crime Complaint Center (IC3). An FBI–NW3C partnership, http://www.ic3.gov/media/annualreports.aspx (accessed July 2011)
UAGI. Ultrascan 419unit-419 Advance Fee Fraud Statistics, http://www.ultrascanagi.com/public_html/html/pdf_files/419_Advance_Fee_Fraud_Statistics_2009.pdf
Marcus, K.R., Seigfried, K.: The future of computer forensics:a needs analysis survey. Computer & Security 23(1), 12–16 (2004)
Ciardhuáin, O.S.: An extended model of cybercrime investigations. International Journal of Digital Evidence 3(1) (2004)
Chandrasekaran, M., Narayanan, K., Upadhyaya, K.S.: Phishing email detection based on structural properties. In: First Annual Symposium on Information Assurance: Intrusion Detection and Prevention, New York, pp. 2–8 (2006)
Abu-Nimeh, S., Nappa, D., Wang, X., Nair, S.: A comparison of machine learning techniques for phishing detection. In: Proceedings of the Anti-Phishing Working Groups (APWG), Second Annual eCrime Researchers Summit, Pittsburgh, PA, US, pp. 1–10 (2007)
Fette, I., Sadeh, N., Tomasic, A.: Learning to detect phishing emails. In: Proceedings of the 16th International Conference on World Wide Web, pp. 649–656. ACM Press, New York (2007)
Airoldi, E., Malin, B.: Data mining challenges for electronic safety: the case of fraudulent intent detection in emails. In: Proceedings of the Workshop on Privacy and Security Aspects of Data Mining, IEEE International Conference on Data Mining, Brighton, England, pp. 1–10 (2004)
Hadjidj, R., Debbabi, M., Lounis, H., Iqbal, F.: Towards an Integrated Email Forensic Analysis Framework. Digital Investigation 5, 124–137 (2009)
Modupe, A., Olugbara, O.O., Ojo, S.O.: Identifying advanced fee fraud activities on the internet using machine learning algorithms. In: 3rd IEEE International Conference on Computational Intelligence and Industrial Application (PACIIA), Wuhan, China, pp. 240–242 (2010)
Wenliang, C., Xingzhi, C., Huizhen, W., Jingbo, Z., Tianshun, Y.: Automatic word clustering for text categorization using global information. In: AIRS, Beijing, China, pp. 1–6. ACM (2004)
Worth, A.P., Cronin, M.T.D.: The use of discriminant analysis, logistic regression and classification tree analysis in the development of classification models for human health effects. Journal of Molecular Structure 622, 97–111 (2003)
Khan, A., Baharudin, B., Lee, L.H., Khan, K.: A review of machine learning algorithms for text documents classification. Journal of Advanced in Information Technology 1(1), 4–20 (2010)
Byvatov, E., Fechner, U., Sadowski, J., Schneider, G.: Comparison of support vector machine and artificial neural network systems for drug/nondrug classification. J. Chem. Inf. Comput. Sci. 43, 1882–1889 (2003)
Yu, B., Xu, Z., Li, C.: Latent semantic analysis for text categorization using neural network. Knowledge-Based Systems 24, 900–904 (2008)
Bishop, C.M.: Neural networks for pattern recognition. Oxford University Press (1995)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification. Wiley-Interscience, New York (2000)
Cortes, C., Vapnik, V.: Support vector networks in machine learning, vol. 20, pp. 273–297 (1995)
Rios, G., Zhu, H.: Exploring support vector machines and random forests for spam detection. In: Proceedings of CEAS 2004 (2004)
Mitra, V., Wang, C., Banerjee, S.: Text classification: a least square support vector machine approach. Applied Soft Computing 7, 908–914 (2007)
Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Kurz, T., Stoffel, K.: Going beyond stemming: creating concept signatures of complex medical terms. Knowledge Based Systems 15, 309–313 (2002)
Klimt, B., Yang, Y.: The Enron Corpus: A New Dataset for Email Classification Research. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 217–226. Springer, Heidelberg (2004)
Salton, G., Yang, C., Wang, A.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software. SIGKDD Explorations 11(1) (2009)
Wang, T., Chiang, H.: Fuzzy support vector machine for multi-class text categorization. Information Process and Management 43, 914–929 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Modupe, A., Olugbara, O.O., Ojo, S.O. (2012). Comparing Supervised Learning Classifiers to Detect Advanced Fee Fraud Activities on Internet. In: Meghanathan, N., Chaki, N., Nagamalai, D. (eds) Advances in Computer Science and Information Technology. Computer Science and Information Technology. CCSIT 2012. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 86. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27317-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-27317-9_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27316-2
Online ISBN: 978-3-642-27317-9
eBook Packages: Computer ScienceComputer Science (R0)