Skip to main content

Forecasting Suspicious Account Activity at Large-Scale Online Service Providers

  • Conference paper
  • First Online:
Financial Cryptography and Data Security (FC 2019)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11598))

Included in the following conference series:

Abstract

In the face of large-scale automated social engineering attacks to large online services, fast detection and remediation of compromised accounts are crucial to limit the spread of the attack and to mitigate the overall damage to users, companies, and the public at large. We advocate a fully automated approach based on machine learning: we develop an early warning system that harnesses account activity traces to predict which accounts are likely to be compromised in the future. We demonstrate the feasibility and applicability of the system through an experiment at a large-scale online service provider using four months of real-world production data encompassing hundreds of millions of users. We show that—even limiting ourselves to login data only in order to derive features with low computational cost, and a basic model selection approach—our classifier can be tuned to achieve good classification precision when used for forecasting. Our system correctly identifies up to one month in advance the accounts later flagged as suspicious with precision, recall, and false positive rates that indicate the mechanism is likely to prove valuable in operational settings to support additional layers of defense.

This work was done when Baris Coskun was with Yahoo! Research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Where the context makes the notation unambiguous, we skip the prefix and use DW only for training-DW or testing-DW. Similarly for LW.

References

  1. von Ahn, L., Blum, M., Hopper, N.J., Langford, J.: CAPTCHA: using hard AI problems for security. In: Biham, E. (ed.) EUROCRYPT 2003. LNCS, vol. 2656, pp. 294–311. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-39200-9_18

    Chapter  Google Scholar 

  2. Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on twitter. In: Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference (CEAS), vol. 6, p. 12 (2010)

    Google Scholar 

  3. Bilge, L., Han, Y., Dell’Amico, M.: Riskteller: predicting the risk of cyber incidents. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, pp. 1299–1311. ACM, New York, NY, USA (2017). https://doi.org/10.1145/3133956.3134022, https://doi.acm.org/10.1145/3133956.3134022

  4. Blanzieri, E., Bryl, A.: A survey of learning-based techniques of email spam filtering. Artif. Intell. Rev. 29(1), 63–92 (2008). https://doi.org/10.1007/s10462-009-9109-6. https://dx.doi.org/10.1007/s10462-009-9109-6

    Article  Google Scholar 

  5. Boshmaf, Y., et al.: Integro: leveraging victim prediction for robust fake account detection in OSNs. In: 22nd Annual Network and Distributed System Security Symposium (NDSS), San Diego, California, USA, 8–11 February 2015, pp. 1–15. http://www.internetsociety.org/doc/integro-leveraging-victim-prediction-robust-fake-account-detection-osns

  6. Canali, D., Bilge, L., Balzarotti, D.: On the effectiveness of risk prediction based on users browsing behavior. In: Proceedings of the 9th ACM Symposium on Information, Computer and Communications Security, ASIA CCS 2014, pp. 171–182. ACM, New York, NY, USA (2014). https://doi.org/10.1145/2590296.2590347, https://doi.acm.org/10.1145/2590296.2590347

  7. Castillo, C., Donato, D., Gionis, A., Murdock, V., Silvestri, F.: Know your neighbors: web spam detection using the web topology. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2007, pp. 423–430. ACM, New York, NY, USA (2007). https://doi.org/10.1145/1277741.1277814, https://doi.acm.org/10.1145/1277741.1277814

  8. Egele, M., Stringhini, G., Kruegel, C., Vigna, G.: COMPA: detecting compromised accounts on social networks. In: Proceedings of the Network & Distributed System Security Symposium, NDSS 2013, ISOC, February 2013

    Google Scholar 

  9. Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15(1), 3133–3181 (2014). http://dl.acm.org/citation.cfm?id=2627435.2697065

    MathSciNet  MATH  Google Scholar 

  10. Halawa, H., Beznosov, K., Boshmaf, Y., Coskun, B., Ripeanu, M., Santos-Neto, E.: Harvesting the low-hanging fruits: defending against automated large-scale cyber-intrusions by focusing on the vulnerable population. In: Proceedings of the 2016 New Security Paradigms Workshop, NSPW 2016, pp. 11–22. ACM, New York, NY, USA (2016). https://doi.org/10.1145/3011883.3011885, https://doi.acm.org/10.1145/3011883.3011885

  11. Halawa, H., Ripeanu, M., Beznosov, K., Coskun, B., Liu, M.: Forecasting suspicious account activity at large-scale online service providers. CoRR abs/1801.08629 (2018). http://arxiv.org/abs/1801.08629

  12. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009). https://doi.org/10.1109/TKDE.2008.239

    Article  Google Scholar 

  13. Ho, G., Javed, A.S.M., Paxson, V., Wagner, D.: Detecting credential spearphishing attacks in enterprise settings. In: Proceedings of the 26rd USENIX Security Symposium, USENIX Security 2017, pp. 469–485 (2017)

    Google Scholar 

  14. Jagatic, T.N., Johnson, N.A., Jakobsson, M., Menczer, F.: Social phishing. Commun. ACM 50(10), 94–100 (2007)

    Article  Google Scholar 

  15. Liu, G., Xiang, G., Pendleton, B.A., Hong, J.I., Liu, W.: Smartening the crowds: computational techniques for improving human verification to fight phishing scams. In: Proceedings of the Seventh Symposium on Usable Privacy and Security, SOUPS 2011, pp. 8:1–8:13. ACM, New York, NY, USA (2011). https://doi.org/10.1145/2078827.2078838, https://doi.acm.org/10.1145/2078827.2078838

  16. Liu, Y., et al.: Cloudy with a chance of breach: forecasting cyber security incidents. In: Proceedings of the 24th USENIX Security Symposium, USENIX Security 2015, pp. 1009–1024 (2015)

    Google Scholar 

  17. Lomax, S., Vadera, S.: A survey of cost-sensitive decision tree induction algorithms. ACM Comput. Surv. 45(2), 16:1–16:35 (2013). https://doi.org/10.1145/2431211.2431215. https://doi.acm.org/10.1145/2431211.2431215

    Article  MATH  Google Scholar 

  18. Ludl, C., McAllister, S., Kirda, E., Kruegel, C.: On the effectiveness of techniques to detect phishing sites. In: M. Hämmerli, B., Sommer, R. (eds.) DIMVA 2007. LNCS, vol. 4579, pp. 20–39. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73614-1_2

    Chapter  Google Scholar 

  19. Moore, T., Clayton, R., Anderson, R.: The economics of online crime. J. Econ. Perspect. 23(3), 3–20 (2009). https://doi.org/10.1257/jep.23.3.3. https://www.aeaweb.org/articles/?doi=10.1257/jep.23.3.3

    Article  Google Scholar 

  20. Provost, F., Fawcett, T.: Robust classification for imprecise environments. Mach. Learn. 42(3), 203–231 (2001). https://doi.org/10.1023/A:1007601015854. https://dx.doi.org/10.1023/A:1007601015854

    Article  MATH  Google Scholar 

  21. Shon, T., Moon, J.: A hybrid machine learning approach to network anomaly detection. Inf. Sci. 177(18), 3799–3821 (2007)

    Article  Google Scholar 

  22. Soska, K., Christin, N.: Automatically detecting vulnerable websites before they turn malicious. In: Proceedings of the 23rd USENIX Security Symposium, USENIX Security 2014, pp. 625–640 (2014)

    Google Scholar 

  23. Stein, T., Chen, E., Mangla, K.: Facebook immune system. In: Proceedings of the 4th Workshop on Social Network Systems, SNS 2011, pp. 8:1–8:8. ACM, New York, NY, USA (2011). https://doi.org/10.1145/1989656.1989664. https://doi.acm.org/10.1145/1989656.1989664

  24. Thomas, K., Li, F., Grier, C., Paxson, V.: Consequences of connectivity: characterizing account hijacking on twitter. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, CCS 2014, pp. 489–500. ACM, New York, NY, USA (2014). https://doi.org/10.1145/2660267.2660282. https://doi.acm.org/10.1145/2660267.2660282

  25. Wang, G., Konolige, T., Wilson, C., Wang, X., Zheng, H., Zhao, B.Y.: You are how you click: clickstream analysis for sybil detection. In: Proceedings of the 22Nd USENIX Conference on Security, SEC 2013, pp. 241–256. USENIX Association, Berkeley, CA, USA (2013). http://dl.acm.org/citation.cfm?id=2534766.2534788

  26. Whittaker, C., Ryner, B., Nazif, M.: Large-scale automatic classification of phishing pages. In: Proceedings of the 17th Annual Network and Distributed System Security Symposium, NDSS Symposium 2010, San Diego, CA, USA (2010)

    Google Scholar 

  27. Yang, Z., Wilson, C., Wang, X., Gao, T., Zhao, B.Y., Dai, Y.: Uncovering social network sybils in the wild. In: Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference, IMC 2011, pp. 259–268. ACM, New York, NY, USA (2011). https://doi.org/10.1145/2068816.2068841. https://doi.acm.org/10.1145/2068816.2068841

  28. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud 2010, p. 10. USENIX Association, Berkeley, CA, USA (2010). http://dl.acm.org/citation.cfm?id=1863103.1863113

  29. Zhang, J., et al.: Safeguarding academic accounts and resources with the university credential abuse auditing system. In: IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012), pp. 1–8, June 2012. https://doi.org/10.1109/DSN.2012.6263961

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hassan Halawa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 International Financial Cryptography Association

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Halawa, H., Beznosov, K., Coskun, B., Liu, M., Ripeanu, M. (2019). Forecasting Suspicious Account Activity at Large-Scale Online Service Providers. In: Goldberg, I., Moore, T. (eds) Financial Cryptography and Data Security. FC 2019. Lecture Notes in Computer Science(), vol 11598. Springer, Cham. https://doi.org/10.1007/978-3-030-32101-7_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32101-7_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32100-0

  • Online ISBN: 978-3-030-32101-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics