Distributed classification for image spam detection

Amir, Amiza; Srinivasan, Bala; Khan, Asad I.

doi:10.1007/s11042-017-4944-y

Distributed classification for image spam detection

Published: 01 July 2017

Volume 77, pages 13249–13278, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Amiza Amir¹,
Bala Srinivasan² &
Asad I. Khan²

368 Accesses
8 Citations
Explore all metrics

Abstract

Spam appears in various forms and the current trend in spamming is moving towards multimedia spam objects. Image spam is a new type of spam attacks which attempts to bypass the spam filters that mostly text-based. Spamming attacks the users in many ways and these are usually countered by having a server to filter the spammers. This paper provides a fully-distributed pattern recognition system within P2P networks using the distributed associative memory tree (DASMET) algorithm to detect spam which is cost-efficient and not prone to a single point of failure, unlike the server-based systems. This algorithm is scalable for large and frequently updated data sets, and specifically designed for data sets that consist of similar occurring patterns.We have evaluated our system against centralised state-of-the-art algorithms (NN, k-NN, naive Bayes, BPNN and RBFN) and distributed P2P-based algorithms (Ivote-DPV, ensemble k-NN, ensemble naive Bayes, and P2P-GN). The experimental results show that our method is highly accurate with a 98 to 99% accuracy rate, and incurs a small number of messages—in the best-case, it requires only two messages per recall test. In summary, our experimental results show that the DAS-MET performs best with a relatively small amount of resources for the spam detection compared to other distributed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Application of Learning Algorithms to Image Spam Evolution

Evaluation of Content Based Spam Filtering Using Data Mining Approach Applied on Text and Image Corpus

Web Spam Detection Using MapReduce Approach to Collective Classification

Notes

The term feature is used alternately with the term attribute in this paper.
Recall refers to classification or prediction in this context.
The input for generating a bias identifier at a leaf node is a raw sub-pattern, while the input at an internal node and the rootnode is a sequence of combined identifiers of its child nodes.
The lookup process only involves a single hop message.
True positive rate is equivalent to recall in the content retrieval context. However, we do not use the term recall here since, in the context of this paper, recall refers to prediction.
F-measure is also known as F-score or F ₁ score.
Every peer knows the location of all the others, so that direct connections among them can be established.

References

Amir A, Srinivasan B, Khan A (2015) A communication-efficient distributed algorithm for large-scale classification within P2P networks. In: Proceedings of the 6th international symposium on information and communication technology, SoICT 2015. ISBN 978-1-4503-3843-1, pp 75–82. doi:10.1145/2833258.2833304. ACM, NY, USA
Amir A, Amin AHM, Khan A (2013) Developing machine intelligence within p2p networks using a distributed associative memory. In: Dowe DL (ed) Algorithmic probability and friends. Bayesian prediction and artificial intelligence: papers from the Ray Solomonoff 85th memorial conference, Melbourne, VIC, Australia, 2011. ISBN 978-3-642-44958-1, pp 439–443. doi:10.1007/978-3-642-44958-1_35. Springer Berlin Heidelberg, Berlin, Heidelberg
Attar A, Rad RM, Atani RE (2013) A survey of image spamming and filtering techniques. Artif Intell Rev 40(1):71–105. ISSN 1573–7462. doi:10.1007/s10462-011-9280-4
Article Google Scholar
Alazab M, Broadhurst R (2015) The role of spam in cybercrime: data from the Australian cybercrime pilot observatory. In: Smith RG, Cheung RC-C, Lau LY-C (eds) Cybercrime risks and responses: eastern and western perspectives, Palgrave Macmillan UK, London, ISBN 978-1-137-47416-2, pp 103–120. doi:10.1057/9781137474162_7
Al-Duwairi B, Khater I, Al-Jarrah O (2012) Detecting image spam using image texture features. Int J Inf Secur Res (IJISR) 2(3/4):344–353
Google Scholar
Blanzieri E, Bryl A (2008) A survey of learning-based techniques of email spam filtering. Artif Intell Rev 29(1):63–92
Article Google Scholar
Bouckaert RR, Frank E, Hall M, Kirkby R, Reutemann P, Seewald A, Scuse D (2014) WEKA Manual for Version 3-7-11, http://www.cs.waikato.ac.nz/ml/weka/documentation.html
Chatzichristofis SA, Boutalis YS (2008) CEDD: Color and edge directivity descriptor: a compact descriptor for image indexing and retrieval. In: Gasteratos A, Vincze M, Tsotsos J (eds) Computer vision systems, vol 5008 of lecture notes in computer science. Springer Berlin Heidelberg, pp 312– 322
Chatzichristofis SA, Boutalis YS (2008) FCTH: Fuzzy color and texture histogram - a low level feature for accurate image retrieval. In: Proceedings of the 2008 9th international workshop on image analysis for multimedia interactive services, (WIAMIS ’08), Klagenfurt, Austria, WIAMIS ’08, IEEE Computer Society. Washington, DC, USA, pp 191–196
Chen J, Zhao H, Yang J, Zhang J, Li T, Wang K (2015) An intelligent character recognition method to filter spam images on cloud. Soft Computing. 1-11ISSN 1433-7479. doi:10.1007/s00500-015-1811-5
Chowdhury M, Gao J, Chowdhury M (2015) Image spam classification using neural network. In: Thuraisingham B, Wang X, Yegneswaran V (eds) Security and privacy in communication networks: 11th international conference, SecureComm 2015, Dallas, TX, USA, October 26-29, 2015, Revised Selected Papers, Springer International Publishing, Cham, ISBN 978-3-319-28865-9, pp 622–632. doi:10.1007/978-3-319-28865-9_41
Dredze M, Gevaryahu R, Elias-Bachrach A (2007) Learning fast classifiers for image spam. In: Fourth conference on email and anti-spam, (CEAS 2007). Mountain View, California
Filasiak R, Grzenda M, Luckner M, Zawistowski P (2014) On the testing of network cyber threat detection methods on spam example. Ann Telecommun - Annal Télécommun 69(7):363–377. ISSN 1958-9395. doi:10.1007/s12243-013-0412-5
Article Google Scholar
Gao Y, Yang M, Zhao X, Pardo B, Wu Y, Pappas TN, Choudhary A (2008) Image spam hunter. In: 2008 IEEE international conference on acoustics, speech and signal processing, ISSN 1520-6149, pp 1765–1768. doi:10.1109/ICASSP.2008.4517972
Geusebroek J-M, Burghouts GJ, Smeulders AWM (2005) The Amsterdam library of object images. Int J Comput Vis 61(1):103–122
Article Google Scholar
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE conference on computer vision and pattern recognition, ISSN 1063-6919, pp 580–587. doi:10.1109/CVPR.2014.81
Gupta R, Singha N, Singh YN (2015) Reputation based probabilistic resource allocation for avoiding free riding and formation of common interest groups in unstructured P2P networks. Peer-to-Peer Networking and Applications, pp 1–13
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor Newslett 11 (1):10–18
Article Google Scholar
Haralick R, Shanmugam K, Dinstein I (1973) Textural features for image classification. IEEE Trans Syst Man Cybern 3(6):610–621
Article Google Scholar
Jin X, Chan S-HG (2010) Detecting malicious nodes in peer-to-peer streaming by peer-based monitoring. ACM Trans Multimed Comput Commun Appl 6(2):9:1–9:18
Article Google Scholar
JFeatureLib, JFeatureLib: A free java library containing feature descriptors and detectors, [Online viewed on April 6, 2013] http://code.google.com/p/jfeaturelib/, 2013
Kapelko R (2013) Towards fault-tolerant chord p2p system: analysis of some replication strategies. In: Ishikawa Y, Li J, Wang W, Zhang R, Zhang W (eds) Web technologies and applications, vol 7808 of lecture notes in computer science. Springer, Berlin Heidelberg, pp 686–696
Kurdi HA (2015) HonestPeer: An enhanced EigenTrust algorithm for reputation management in fP2Pg systems. J King Saud Univ - Comput Inf Sci 27(3):315–322
Google Scholar
Luo P, Xiong H, Lü K, Shi Z (2007) Distributed classification in Peer-to-Peer networks. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, (KDD ’07). San Jose, California, USA, pp 968–976
Maldonado S, L’Huillier G (2013) SVM-based feature selection and classification for email filtering. In: Latorre Carmona P, Snchez JS, Fred AL (eds) Pattern recognition - applications and methods, vol 204 of advances in intelligent systems and computing. Springer Berlin Heidelberg, pp 135–148
Mehta B, Nangia S, Gupta M, Nejdl W (2008) Detecting image spam using visual features and near duplicate detection Proceedings of the 17th international conference on world wide web, (WWW’08). Beijing, China, pp 497–506
Montresor A, Jelasity M (2009) PeerSim: A scalable P2P simulator. In: Proceedings of the 9th international conference on peer-to-peer, (P2P’09). Seattle, Washington, USA, pp 99–100
Ozgur L, Gungor T, Gurgen F (2004) Spam mail detection using artificial neural network and bayesian filter. In: Yang Z, Yin H, Everson R (eds) Intelligent data engineering and automated learning IDEAL 2004, vol 3177 of lecture notes in computer science. Springer Berlin Heidelberg, pp 505–510
Ruan G, Tan Y (2010) A three-layer back-propagation neural network for spam detection using artificial immune concentration. Soft Comput 14(2):139–150. doi:10.1007/s00500-009-0440-2
Article Google Scholar
Sig2Dat, Sig2Dat Website, [Online viewed on April 6, 2016] http://sourceforge.net/projects/sig2dat/, 2016
Vieira AB, De Almeida RB, De Almeida JM, Campos SVA (2013) SimplyRep: A simple and effective reputation system to fight pollution in fP2Pg live streaming. Comput Netw 57(4):1019–1036. ISSN 1389-1286
Article Google Scholar
Wakade S, Liszka KJ, Chan C-C (2013) Application of learning algorithms to image spam evolution. In: Ramanna S, Jain CL, Howlett JR (eds) Emerging paradigms in machine learning, Springer Berlin Heidelberg. Berlin, Heidelberg, ISBN 978-3-642-28699-5, pp 471–495, doi:10.1007/978-3-642-28699-5_18
Zhang C, Huang L (2015) Study on content-based of image retrieval. In: Zhang R, Zhang Z, Liu K, Zhang J (eds) LISS 2013: Proceedings of 3rd international conference on logistics, informatics and service science. ISBN 978-3-642-40660-7, pp 591–594, 10.1007/978-3-642-40660-7_87. Springer Berlin Heidelberg, Berlin, Heidelberg
Zhou F, Zhuang L, Zhao BY, Huang L, Joseph AD, Kubiatowicz J (2003) Approximate object location and spam filtering on peer-to-peer systems. In: Proceedings of the ACM/IFIP/USENIX 2003 international conference on middleware, (Middleware ’03). Rio de Janeiro, Brazil, pp 1– 20
Zuo M, Ma Y-H, Chbeir R, Li J-H (2007) Combating P2P file pollution with co-alerting. In: Proceedings of the 2007 3rd international IEEE conference on signal-image technologies and internet-based system (SITIS 2007). Shanghai, China, pp 289–297

Download references

Acknowledgement

The research reported in this paper is supported by Research Acculturation Grant Scheme (RAGS) 9018-00080. The authors would also like to express gratitude to the Malaysian Ministry of Higher Education (MOHE) and University Malaysia Perlis (UniMAP) for the facilities provided.

Author information

Authors and Affiliations

School of Computer and Communication Engineering, Universiti Malaysia Perlis, Perlis, Malaysia
Amiza Amir
Faculty of Information Technology, Monash University, Melbourne, Australia
Bala Srinivasan & Asad I. Khan

Authors

Amiza Amir
View author publications
You can also search for this author in PubMed Google Scholar
Bala Srinivasan
View author publications
You can also search for this author in PubMed Google Scholar
Asad I. Khan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amiza Amir.

Appendix A: Other Algorithms

1.1 A.1 Tree construction

Algorithm 3 are executed to generate a logical DASMET tree. The process of constructing the logical tree is recursive and it starts from a root node. Let level be 0, $\widehat {X}=\{\hat {x}_{i}\}_{i=1}^{n_{h}}$ be a set of sub-patterns, w be the number of sub-patterns (n _H), m be the maximum number of children of each node and d _s be the segment size at a leaf node. Note that m in this algorithm is equal to φ _s. All segments in $\widehat {X}$ are initially assigned to the root node V where the following steps in function constructTree(level, $w,\widehat {X},m$, H) as explained in Algorithm 3 are executed.

The node firstly determines whether it should expand the tree or not. In case that w is less or equal to m, then the node creates w leaf nodes and assigns one segment per leaf node; this completes the process. Otherwise, it determines the number of children n _c using (9) as below.

$$ n_{c} = \left\{ \begin{array}{rl} m &\text{if} \lfloor\frac{w}{m}\rfloor \geq m \\ \left\lfloor \frac{w}{m}\right\rfloor+1 &\text{ otherwise} \end{array} \right. $$

(9)

Next, it creates n _c child nodes and distributes the available segments to these child nodes using greedy approach. Upon receiving w segments from its parent, every child node then executes Algorithm 3. This process is executed recursively until w ≤ m.

1.2 A.2 Generate Identifier

Rights and permissions

Reprints and permissions

About this article

Cite this article

Amir, A., Srinivasan, B. & Khan, A.I. Distributed classification for image spam detection. Multimed Tools Appl 77, 13249–13278 (2018). https://doi.org/10.1007/s11042-017-4944-y

Download citation

Received: 30 June 2016
Revised: 06 March 2017
Accepted: 13 June 2017
Published: 01 July 2017
Issue Date: June 2018
DOI: https://doi.org/10.1007/s11042-017-4944-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distributed classification for image spam detection

Abstract

Access this article

Similar content being viewed by others

Application of Learning Algorithms to Image Spam Evolution

Evaluation of Content Based Spam Filtering Using Data Mining Approach Applied on Text and Image Corpus

Web Spam Detection Using MapReduce Approach to Collective Classification

Notes

References

Acknowledgement