Skip to main content
Log in

Distributed classification for image spam detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Spam appears in various forms and the current trend in spamming is moving towards multimedia spam objects. Image spam is a new type of spam attacks which attempts to bypass the spam filters that mostly text-based. Spamming attacks the users in many ways and these are usually countered by having a server to filter the spammers. This paper provides a fully-distributed pattern recognition system within P2P networks using the distributed associative memory tree (DASMET) algorithm to detect spam which is cost-efficient and not prone to a single point of failure, unlike the server-based systems. This algorithm is scalable for large and frequently updated data sets, and specifically designed for data sets that consist of similar occurring patterns.We have evaluated our system against centralised state-of-the-art algorithms (NN, k-NN, naive Bayes, BPNN and RBFN) and distributed P2P-based algorithms (Ivote-DPV, ensemble k-NN, ensemble naive Bayes, and P2P-GN). The experimental results show that our method is highly accurate with a 98 to 99% accuracy rate, and incurs a small number of messages—in the best-case, it requires only two messages per recall test. In summary, our experimental results show that the DAS-MET performs best with a relatively small amount of resources for the spam detection compared to other distributed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. The term feature is used alternately with the term attribute in this paper.

  2. Recall refers to classification or prediction in this context.

  3. The input for generating a bias identifier at a leaf node is a raw sub-pattern, while the input at an internal node and the rootnode is a sequence of combined identifiers of its child nodes.

  4. The lookup process only involves a single hop message.

  5. True positive rate is equivalent to recall in the content retrieval context. However, we do not use the term recall here since, in the context of this paper, recall refers to prediction.

  6. F-measure is also known as F-score or F 1 score.

  7. Every peer knows the location of all the others, so that direct connections among them can be established.

References

  1. Amir A, Srinivasan B, Khan A (2015) A communication-efficient distributed algorithm for large-scale classification within P2P networks. In: Proceedings of the 6th international symposium on information and communication technology, SoICT 2015. ISBN 978-1-4503-3843-1, pp 75–82. doi:10.1145/2833258.2833304. ACM, NY, USA

  2. Amir A, Amin AHM, Khan A (2013) Developing machine intelligence within p2p networks using a distributed associative memory. In: Dowe DL (ed) Algorithmic probability and friends. Bayesian prediction and artificial intelligence: papers from the Ray Solomonoff 85th memorial conference, Melbourne, VIC, Australia, 2011. ISBN 978-3-642-44958-1, pp 439–443. doi:10.1007/978-3-642-44958-1_35. Springer Berlin Heidelberg, Berlin, Heidelberg

  3. Attar A, Rad RM, Atani RE (2013) A survey of image spamming and filtering techniques. Artif Intell Rev 40(1):71–105. ISSN 1573–7462. doi:10.1007/s10462-011-9280-4

    Article  Google Scholar 

  4. Alazab M, Broadhurst R (2015) The role of spam in cybercrime: data from the Australian cybercrime pilot observatory. In: Smith RG, Cheung RC-C, Lau LY-C (eds) Cybercrime risks and responses: eastern and western perspectives, Palgrave Macmillan UK, London, ISBN 978-1-137-47416-2, pp 103–120. doi:10.1057/9781137474162_7

  5. Al-Duwairi B, Khater I, Al-Jarrah O (2012) Detecting image spam using image texture features. Int J Inf Secur Res (IJISR) 2(3/4):344–353

    Google Scholar 

  6. Blanzieri E, Bryl A (2008) A survey of learning-based techniques of email spam filtering. Artif Intell Rev 29(1):63–92

    Article  Google Scholar 

  7. Bouckaert RR, Frank E, Hall M, Kirkby R, Reutemann P, Seewald A, Scuse D (2014) WEKA Manual for Version 3-7-11, http://www.cs.waikato.ac.nz/ml/weka/documentation.html

  8. Chatzichristofis SA, Boutalis YS (2008) CEDD: Color and edge directivity descriptor: a compact descriptor for image indexing and retrieval. In: Gasteratos A, Vincze M, Tsotsos J (eds) Computer vision systems, vol 5008 of lecture notes in computer science. Springer Berlin Heidelberg, pp 312– 322

  9. Chatzichristofis SA, Boutalis YS (2008) FCTH: Fuzzy color and texture histogram - a low level feature for accurate image retrieval. In: Proceedings of the 2008 9th international workshop on image analysis for multimedia interactive services, (WIAMIS ’08), Klagenfurt, Austria, WIAMIS ’08, IEEE Computer Society. Washington, DC, USA, pp 191–196

  10. Chen J, Zhao H, Yang J, Zhang J, Li T, Wang K (2015) An intelligent character recognition method to filter spam images on cloud. Soft Computing. 1-11ISSN 1433-7479. doi:10.1007/s00500-015-1811-5

  11. Chowdhury M, Gao J, Chowdhury M (2015) Image spam classification using neural network. In: Thuraisingham B, Wang X, Yegneswaran V (eds) Security and privacy in communication networks: 11th international conference, SecureComm 2015, Dallas, TX, USA, October 26-29, 2015, Revised Selected Papers, Springer International Publishing, Cham, ISBN 978-3-319-28865-9, pp 622–632. doi:10.1007/978-3-319-28865-9_41

  12. Dredze M, Gevaryahu R, Elias-Bachrach A (2007) Learning fast classifiers for image spam. In: Fourth conference on email and anti-spam, (CEAS 2007). Mountain View, California

  13. Filasiak R, Grzenda M, Luckner M, Zawistowski P (2014) On the testing of network cyber threat detection methods on spam example. Ann Telecommun - Annal Télécommun 69(7):363–377. ISSN 1958-9395. doi:10.1007/s12243-013-0412-5

    Article  Google Scholar 

  14. Gao Y, Yang M, Zhao X, Pardo B, Wu Y, Pappas TN, Choudhary A (2008) Image spam hunter. In: 2008 IEEE international conference on acoustics, speech and signal processing, ISSN 1520-6149, pp 1765–1768. doi:10.1109/ICASSP.2008.4517972

  15. Geusebroek J-M, Burghouts GJ, Smeulders AWM (2005) The Amsterdam library of object images. Int J Comput Vis 61(1):103–122

    Article  Google Scholar 

  16. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE conference on computer vision and pattern recognition, ISSN 1063-6919, pp 580–587. doi:10.1109/CVPR.2014.81

  17. Gupta R, Singha N, Singh YN (2015) Reputation based probabilistic resource allocation for avoiding free riding and formation of common interest groups in unstructured P2P networks. Peer-to-Peer Networking and Applications, pp 1–13

  18. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor Newslett 11 (1):10–18

    Article  Google Scholar 

  19. Haralick R, Shanmugam K, Dinstein I (1973) Textural features for image classification. IEEE Trans Syst Man Cybern 3(6):610–621

    Article  Google Scholar 

  20. Jin X, Chan S-HG (2010) Detecting malicious nodes in peer-to-peer streaming by peer-based monitoring. ACM Trans Multimed Comput Commun Appl 6(2):9:1–9:18

    Article  Google Scholar 

  21. JFeatureLib, JFeatureLib: A free java library containing feature descriptors and detectors, [Online viewed on April 6, 2013] http://code.google.com/p/jfeaturelib/, 2013

  22. Kapelko R (2013) Towards fault-tolerant chord p2p system: analysis of some replication strategies. In: Ishikawa Y, Li J, Wang W, Zhang R, Zhang W (eds) Web technologies and applications, vol 7808 of lecture notes in computer science. Springer, Berlin Heidelberg, pp 686–696

  23. Kurdi HA (2015) HonestPeer: An enhanced EigenTrust algorithm for reputation management in fP2Pg systems. J King Saud Univ - Comput Inf Sci 27(3):315–322

    Google Scholar 

  24. Luo P, Xiong H, Lü K, Shi Z (2007) Distributed classification in Peer-to-Peer networks. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, (KDD ’07). San Jose, California, USA, pp 968–976

  25. Maldonado S, L’Huillier G (2013) SVM-based feature selection and classification for email filtering. In: Latorre Carmona P, Snchez JS, Fred AL (eds) Pattern recognition - applications and methods, vol 204 of advances in intelligent systems and computing. Springer Berlin Heidelberg, pp 135–148

  26. Mehta B, Nangia S, Gupta M, Nejdl W (2008) Detecting image spam using visual features and near duplicate detection Proceedings of the 17th international conference on world wide web, (WWW’08). Beijing, China, pp 497–506

  27. Montresor A, Jelasity M (2009) PeerSim: A scalable P2P simulator. In: Proceedings of the 9th international conference on peer-to-peer, (P2P’09). Seattle, Washington, USA, pp 99–100

  28. Ozgur L, Gungor T, Gurgen F (2004) Spam mail detection using artificial neural network and bayesian filter. In: Yang Z, Yin H, Everson R (eds) Intelligent data engineering and automated learning IDEAL 2004, vol 3177 of lecture notes in computer science. Springer Berlin Heidelberg, pp 505–510

  29. Ruan G, Tan Y (2010) A three-layer back-propagation neural network for spam detection using artificial immune concentration. Soft Comput 14(2):139–150. doi:10.1007/s00500-009-0440-2

    Article  Google Scholar 

  30. Sig2Dat, Sig2Dat Website, [Online viewed on April 6, 2016] http://sourceforge.net/projects/sig2dat/, 2016

  31. Vieira AB, De Almeida RB, De Almeida JM, Campos SVA (2013) SimplyRep: A simple and effective reputation system to fight pollution in fP2Pg live streaming. Comput Netw 57(4):1019–1036. ISSN 1389-1286

    Article  Google Scholar 

  32. Wakade S, Liszka KJ, Chan C-C (2013) Application of learning algorithms to image spam evolution. In: Ramanna S, Jain CL, Howlett JR (eds) Emerging paradigms in machine learning, Springer Berlin Heidelberg. Berlin, Heidelberg, ISBN 978-3-642-28699-5, pp 471–495, doi:10.1007/978-3-642-28699-5_18

  33. Zhang C, Huang L (2015) Study on content-based of image retrieval. In: Zhang R, Zhang Z, Liu K, Zhang J (eds) LISS 2013: Proceedings of 3rd international conference on logistics, informatics and service science. ISBN 978-3-642-40660-7, pp 591–594, 10.1007/978-3-642-40660-7_87. Springer Berlin Heidelberg, Berlin, Heidelberg

  34. Zhou F, Zhuang L, Zhao BY, Huang L, Joseph AD, Kubiatowicz J (2003) Approximate object location and spam filtering on peer-to-peer systems. In: Proceedings of the ACM/IFIP/USENIX 2003 international conference on middleware, (Middleware ’03). Rio de Janeiro, Brazil, pp 1– 20

  35. Zuo M, Ma Y-H, Chbeir R, Li J-H (2007) Combating P2P file pollution with co-alerting. In: Proceedings of the 2007 3rd international IEEE conference on signal-image technologies and internet-based system (SITIS 2007). Shanghai, China, pp 289–297

Download references

Acknowledgement

The research reported in this paper is supported by Research Acculturation Grant Scheme (RAGS) 9018-00080. The authors would also like to express gratitude to the Malaysian Ministry of Higher Education (MOHE) and University Malaysia Perlis (UniMAP) for the facilities provided.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amiza Amir.

Appendix A: Other Algorithms

Appendix A: Other Algorithms

1.1 A.1 Tree construction

Algorithm 3 are executed to generate a logical DASMET tree. The process of constructing the logical tree is recursive and it starts from a root node. Let level be 0, \(\widehat {X}=\{\hat {x}_{i}\}_{i=1}^{n_{h}}\) be a set of sub-patterns, w be the number of sub-patterns (n H ), m be the maximum number of children of each node and d s be the segment size at a leaf node. Note that m in this algorithm is equal to φ s . All segments in \(\widehat {X}\) are initially assigned to the root node V where the following steps in function constructTree(level, \(w,\widehat {X},m\), H) as explained in Algorithm 3 are executed.

figure f

The node firstly determines whether it should expand the tree or not. In case that w is less or equal to m, then the node creates w leaf nodes and assigns one segment per leaf node; this completes the process. Otherwise, it determines the number of children n c using (9) as below.

$$ n_{c} = \left\{ \begin{array}{rl} m &\text{if} \lfloor\frac{w}{m}\rfloor \geq m \\ \left\lfloor \frac{w}{m}\right\rfloor+1 &\text{ otherwise} \end{array} \right. $$
(9)

Next, it creates n c child nodes and distributes the available segments to these child nodes using greedy approach. Upon receiving w segments from its parent, every child node then executes Algorithm 3. This process is executed recursively until wm.

1.2 A.2 Generate Identifier

figure g

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Amir, A., Srinivasan, B. & Khan, A.I. Distributed classification for image spam detection. Multimed Tools Appl 77, 13249–13278 (2018). https://doi.org/10.1007/s11042-017-4944-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-4944-y

Keywords

Navigation