Skip to main content
Log in

Authorship verification applied to detection of compromised accounts on online social networks

A continuous approach

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Compromising legitimate accounts has been the most used strategy to spread malicious content on OSN (Online Social Network). To address this problem, we propose a pure text mining approach to check if an account has been compromised based on its posts content. In the first step, the proposed approach extracts the writing style from the user account. The second step comprehends the k-Nearest Neighbors algorithm (k-NN) to evaluate the post content and identify the user. Finally, Baseline Updating (third step) consists of a continuous updating of the user baseline to support the current trends and seasonality issues of user’s posts. Experiments were carried out using a dataset from Twitter composed by tweets of 1000 users. All the three steps were individually evaluated, and the results show that the developed method is stable and can detect the compromised accounts. An important observation is the Baseline Updating contribution, which leads to an enhancement of accuracy superior of 60 %. Regarding average accuracy, the developed method achieved results over 93 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://www.bbc.com/news/world-us-canada-30853311

  2. http://www.bbc.com/news/world-us-canada-30785232

  3. https://support.twitter.com/articles/31796

  4. https://cran.r-project.org/web/packages/FNN/FNN.pdf

  5. http://www.mathworks.com/help/stats/classificationk-NN-class.html

  6. https://wiki.cites.illinois.edu/wiki/display/forward/Dataset-UDI-TwitterCrawl-Aug2012 #Dataset-UDI-TwitterCrawl-Aug2012-4.Creation

  7. http://www.uel.br/grupo-pesquisa/remid/?page_id=145

References

  1. Aggarwal CC (2014) Data classification: algorithms and applications CRC Press

  2. Argamon S, Šarić M, Stein SS (2003) Style mining of electronic messages for multiple authorship discrimination: first results. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 475–480

  3. Bahrainian S-A, Dengel A (2013) Sentiment analysis Summarization of twitter data. In: 2013 IEEE 16th International conference on Computational Science and Engineering (CSE). IEEE, pp 227–234

  4. Benevenuto F, Magno G, Rodrigues T, Almeida V (2010) Detecting spammers on twitter. In: Collaboration, electronic messaging, anti-abuse and spam conference (CEAS), vol 6, p 12

  5. Bhat SY, Abulaish M (2013) Community-based features for identifying spammers in online social networks. In: Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining. ACM, pp 100–107

  6. Bliss CA, Kloumann IM, Harris KD, Danforth CM, Dodds PS (2012) Twitter reciprocal reply networks exhibit assortativity with respect to happiness. J Comput Sci 3(5):388–397

    Article  Google Scholar 

  7. Brocardo ML, Traore I, Saad S, Woungang I (2013) Authorship verification for short messages using stylometry. In: Computer, Information and Telecommunication Systems (CITS) international conference on. IEEE, pp 1–6

  8. Brocardo ML, Traore I, Woungang I (2014) Authorship verification of e-mail and tweet messages applied for continuous authentication. Journal of Computer and System Sciences pages –

  9. Canales O, Monaco V, Murphy T, Zych E, Stewart J, Castro CTA, Sotoye O, Torres L, Truley G (2011) A stylometry system for authenticating students taking online tests. P. of Student-Faculty Research Day, Ed., CSIS. Pace University

  10. Cao Q, Sirivianos M, Yang X, Pregueiro T (2012) Aiding the detection of fake accounts in large scale social online services. In: Proceedings of the 9th USENIX conference on networked systems design and implementation. USENIX Association, pp 15–15

  11. Chen X, Hao P, Chandramouli R, Subbalakshmi KP (2011) Authorship similarity detection from email messages. In: Machine learning and data mining in pattern recognition. Springer, pp 375–386

  12. Cingiz MÖ, Diri B, Biricik G (2015) Am i typing fresh tweets: detecting up-to-dateness and worth of categorical information in microblogs. Expert Syst Appl 42(12):5256–5263

    Article  Google Scholar 

  13. Corney M, Vel OD, Anderson A, Mohay G (2002) Gender-preferential text mining of e-mail discourse. In: Computer security applications conference proceedings. 18th annual, pp 282–289

  14. Cresci S, Pietro RD, Petrocchi M, Spognardi A, Tesconi M (2014) A fake follower story: improving fake accounts detection on twitter. IIT-CNR, Tech. Rep TR-03

  15. da Silva NFF, Hruschka ER, Hruschka ER (2014) Tweet sentiment analysis with classifier ensembles. Decis Support Syst 66:170–179

    Article  Google Scholar 

  16. Derczynski L, Ritter A, Clark S, Bontcheva K (2013) Twitter part-of-speech tagging for all: overcoming sparse and noisy data

  17. Donais JA, Frost RA, Peelar SM, Roddy RA (2013) Summary: A system for the automated author attribution of text and instant messages. In: Advances in Social Networks Analysis and Mining (ASONAM), 2013 IEEE/ACM international conference on. IEEE, pp 1484–1485

  18. Duda RO, Hart PE, Stork DG (2012) Pattern Classification. Wiley, New York

    MATH  Google Scholar 

  19. Egele M, Stringhini G, Kruegel C, Vigna G (2013) Compa: detecting compromised accounts on social networks. In: NDSS

  20. El Manar El S, Kassou I (2014) Authorship analysis studies: a survey. Int J Comput Appl 86(12)

  21. Fan X, Yuan C (2015) An improved lower bound for bayesian network structure learning. In: AAAI, pp 3526–3532

  22. Fan X, Yuan C, Malone BM (2014) Tightening bounds for Bayesian network structure learning. In: AAAI, pp 2439–2445

  23. Fersini E, Messina E, Pozzi FA (2014) Sentiment analysis Bayesian ensemble learning. Decis Support Syst 68:26–38

    Article  Google Scholar 

  24. Fong S, Zhuang Y, He J (2012) Not every friend on a social network can be trusted: classifying imposters using decision trees. In: 2012 International conference on future generation communication technology (FGCT), pp 58–63

  25. Gao H, Hu J, Wilson C, Li Z, Chen Y, Zhao BY (2010) Detecting and characterizing social spam campaigns. In: Proceedings of the 10th ACM SIGCOMM conference on internet measurement. ACM, pp 35–47

  26. Grier C, Thomas K, Paxson V, Zhang M (2010) @ spam: the underground on 140 characters or less. In: Proceedings of the 17th ACM conference on computer and communications security. ACM, pp 27–37

  27. Hadjidj R, Debbabi M, Lounis H, Iqbal F, Szporer A, Benredjem D (2009) Towards an integrated e-mail forensic analysis framework. Digit Investig 5 (3):124–137

    Article  Google Scholar 

  28. Hassan A, Abbasi A, Zeng D (2013) Twitter sentiment analysis: a bootstrap ensemble framework. In: 2013 International conference on social computing (SocialCom). IEEE, pp 357–364

  29. Hogenboom A, Frasincar F, Jong FD, Kaymak U (2015) Polarity classification using structure-based vector representations of text. Decis Support Syst 74:46–56

    Article  Google Scholar 

  30. Hsieh L-C, Lee C-W, Chiu T-H, Hsu W (2012) Live semantic sport highlight detection based on analyzing tweets of twitter. In: 2012 IEEE international conference on multimedia and expo (ICME). IEEE, pp 949–954

  31. Igawa RA, Barbon Jr S, Paulo KCS, Kido GS, Guido RC, Júnior MLP, da Silva IN (2016) Account classification in online social networks with lbca and wavelets. Inf Sci 332:72–83

    Article  Google Scholar 

  32. Igawa RA, de Almeida AMG, Zarpelao BB, Barbon Jr S (2015) Recognition of compromised accounts on twitter. In: Proceedings of the annual conference on Brazilian symposium on information systems: information systems: a computer socio-technical perspective. SBSI 2015, vol 1. Brazilian Computer Society, Porto Alegre, Brazil, Brazil, pp 2:9–2:14

  33. Iqbal F, Binsalleeh H, Fung BCM, Debbabi M (2010) Mining writeprints from anonymous e-mails for forensic investigation. Digit Investig 7(1):56–64

    Article  Google Scholar 

  34. Iqbal F, Binsalleeh H, Fung BCM, Debbabi M (2013) A unified data mining solution for authorship analysis in anonymous textual communications. Inf Sci 231:98–112

    Article  Google Scholar 

  35. Iqbal F, Hadjidj R, Fung BCM, Debbabi M (2008) A novel approach of mining write-prints for authorship attribution in e-mail forensics. Digit Investig 5:S42–S51

    Article  Google Scholar 

  36. Iqbal F, Khan LA, Fung B, Debbabi M (2010) E-mail authorship verification for forensic investigation. In: Proceedings of the ACM symposium on applied computing. ACM, pp 1591–1598

  37. Jiang M, Cui P, Beutel A, Faloutsos C, Yang S (2014) Detecting suspicious following behavior in multimillion-node social networks. In: Proceedings of the companion publication of the 23rd international conference on world wide web companion. International World Wide Web Conferences Steering Committee, pp 305–306

  38. Keretna S, Hossny A, Creighton D (2013) Recognising user identity in twitter social networks via text mining. In: 2013 IEEE International conference on systems, man, and cybernetics (SMC). IEEE, pp 3079–3082

  39. Koppel M, Argamon S, Shimoni AR (2002) Automatically categorizing written texts by author gender. Literary Linguistic Comput 17(4):401–412

    Article  Google Scholar 

  40. Koppel M, Schler J (2004) Authorship verification as a one-class classification problem. In: Proceedings of the twenty-first international conference on machine learning. ACM, p 62

  41. Koppel M, Schler J, Argamon S (2009) Computational methods in authorship attribution. J Am Soc Inf Sci Technol 60(1):9–26

    Article  Google Scholar 

  42. Kucukyilmaz T, Barla Cambazoglu B, Aykanat C, Can F (2008) Chat mining: predicting user and message attributes in computer-mediated communication. Inf Process Manag 44(4):1448–1466

    Article  Google Scholar 

  43. Layton R, Watters P, Dazeley R (2010) Authorship attribution for twitter in 140 characters or less. In: 2010 Second cybercrime and trustworthy computing workshop (CTC). IEEE, pp 1–8

  44. Lee K, Caverlee J, Webb S (2010) Uncovering social spammers: social honeypots + machine learning. In: Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 435–442

  45. Li R, Wang S, Deng H, Wang R, Chang K C-C (2012) Towards social user profiling: unified and discriminative influence model for inferring home locations. In: KDD, pp 1023–1031

  46. Li X, Wang M, Liang T-P (2014) A multi-theoretical kernel-based approach to social network-based recommendation. Decis Support Syst 65:95–104

    Article  Google Scholar 

  47. Liao H-Y, Chen K-Y, Liu D-R (2015) Virtual friend recommendations in virtual worlds. Decis Support Syst 69:59–69

    Article  Google Scholar 

  48. Liu Z, Yang Z, Liu S, Shi Y (2013) Semi-random subspace method for writeprint identification. Neurocomputing 108:93–102

    Article  Google Scholar 

  49. Lumezanu C, Feamster N (2012) Observing common spam in tweets and email. In: Proc. IMC. Citeseer

  50. Martinez-Romo J, Araujo L (2013) Detecting malicious tweets in trending topics using a statistical analysis of language. Expert Syst Appl 40(8):2992–3000

    Article  Google Scholar 

  51. Mostafa MM (2013) More than words: social networks text mining for consumer brand sentiments. Expert Syst Appl 40(10):4241–4251

    Article  Google Scholar 

  52. Neme A, Pulido JRG, Muoz A, Hernn̈dez S, Dey T (2015) Stylistics analysis and authorship attribution algorithms based on self-organizing maps. Neurocomputing 147:147–159. Advances in self-organizing maps subtitle of the special issue: selected papers from the workshop on self-organizing maps 2012 (WSOM 2012)

    Article  Google Scholar 

  53. Potha N, Stamatatos E (2014) A profile-based method for authorship verification. In: Likas A, Blekas K, Kalles D (eds) Artificial intelligence: methods and applications, volume 8445 of lecture notes in computer science, pp 313–326. Springer International Publishing

  54. Qian T, Liu B, Li C, Peng Z, Zhong M, He G, Li X, Gang X (2015) Tri-training for authorship attribution with limited training data: a comprehensive study. Neurocomputing pages –

  55. Ramezani R, Sheydaei N, Kahani M (2013) Evaluating the effects of textual features on authorship attribution accuracy. In: 2013 3th International eConference on computer and knowledge engineering (ICCKE). IEEE, pp 108–113

  56. Santos I, Miñambres-Marcos I, Laorden C, Galán-García P, Santamaría-Ibirika A, Bringas P (2014) Twitter content-based spam filtering. In: International Joint Conference SOCO13-CISIS13-ICEUTE13. Springer, pp 449–458

  57. Smailović J, Grčar M, Lavrač N, žnidaršič M (2014) Stream-based active learning for sentiment analysis in the financial domain. Information Sciences

  58. Song J, Lee S, Kim J (2011) Spam filtering in twitter using sender-receiver relationship. In: Recent advances in intrusion detection. Springer, pp 301–317

  59. Stein T, Chen E, Mangla K (2011) Facebook immune system. In: Proceedings of the 4th workshop on social network systems. ACM, p 8

  60. Sun J, Yang Z, Wang P, Liu S (2010) Variable length character n-gram approach for online writeprint identification. In: International conference on multimedia information networking and security (MINES). IEEE, pp 486–490

  61. Theodoridis S, Pikrakis A, Koutroumbas K, Cavouras D (2010) Introduction to pattern recognition: a Matlab approach: a Matlab approach. Academic Press

  62. Weathers D, Swain SD, Grover V (2015) Can online product reviews be more helpful? Examining characteristics of information content by product type. Decis Support Syst 79:12–23

    Article  Google Scholar 

  63. Yu SJ (2012) The dynamic competitive recommendation algorithm in social network services. Inf Sci 187:1–14

    Article  Google Scholar 

  64. Zadeh AH, Sharda R (2014) Modeling brand post popularity dynamics in online social networks. Decis Support Syst 65:59–68

    Article  Google Scholar 

  65. Zangerle E, Specht G (2014) Sorry, I was hacked: a classification of compromised twitter accounts. In: Proceedings of the 29th annual ACM symposium on applied computing. ACM, pp 587–593

  66. Zappavigna M (2011) Ambient affiliation: a linguistic perspective on twitter. New Media Soc 13(5): 788–806

    Article  Google Scholar 

  67. Zhang C, Xindong W, Niu Z, Ding W (2014) Authorship identification from unstructured texts Knowledge-based systems

  68. Zhang Z, Wang K (2013) A trust model for multimedia social networks. Soc Netw Anal Min 3(4): 969–979

    Article  Google Scholar 

  69. Zhang Z, Liu Y, Ding W, Huang WW, Qin S, Chen P (2015) Proposing a new friend recommendation method, frutai, to enhance social media providers’ performance. Decis Support Syst 79:46–54

    Article  Google Scholar 

  70. Zhou X, Sai W, Chen C, Chen G, Ying S (2014) Real-time recommendation for microblogs. Inf Sci 279:301–325

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sylvio Barbon Jr.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Barbon, S., Igawa, R.A. & Bogaz Zarpelão, B. Authorship verification applied to detection of compromised accounts on online social networks. Multimed Tools Appl 76, 3213–3233 (2017). https://doi.org/10.1007/s11042-016-3899-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-3899-8

Keywords

Navigation