Abstract
In the face of large-scale automated social engineering attacks to large online services, fast detection and remediation of compromised accounts are crucial to limit the spread of the attack and to mitigate the overall damage to users, companies, and the public at large. We advocate a fully automated approach based on machine learning: we develop an early warning system that harnesses account activity traces to predict which accounts are likely to be compromised in the future. We demonstrate the feasibility and applicability of the system through an experiment at a large-scale online service provider using four months of real-world production data encompassing hundreds of millions of users. We show that—even limiting ourselves to login data only in order to derive features with low computational cost, and a basic model selection approach—our classifier can be tuned to achieve good classification precision when used for forecasting. Our system correctly identifies up to one month in advance the accounts later flagged as suspicious with precision, recall, and false positive rates that indicate the mechanism is likely to prove valuable in operational settings to support additional layers of defense.
This work was done when Baris Coskun was with Yahoo! Research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Where the context makes the notation unambiguous, we skip the prefix and use DW only for training-DW or testing-DW. Similarly for LW.
References
von Ahn, L., Blum, M., Hopper, N.J., Langford, J.: CAPTCHA: using hard AI problems for security. In: Biham, E. (ed.) EUROCRYPT 2003. LNCS, vol. 2656, pp. 294–311. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-39200-9_18
Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on twitter. In: Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference (CEAS), vol. 6, p. 12 (2010)
Bilge, L., Han, Y., Dell’Amico, M.: Riskteller: predicting the risk of cyber incidents. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, pp. 1299–1311. ACM, New York, NY, USA (2017). https://doi.org/10.1145/3133956.3134022, https://doi.acm.org/10.1145/3133956.3134022
Blanzieri, E., Bryl, A.: A survey of learning-based techniques of email spam filtering. Artif. Intell. Rev. 29(1), 63–92 (2008). https://doi.org/10.1007/s10462-009-9109-6. https://dx.doi.org/10.1007/s10462-009-9109-6
Boshmaf, Y., et al.: Integro: leveraging victim prediction for robust fake account detection in OSNs. In: 22nd Annual Network and Distributed System Security Symposium (NDSS), San Diego, California, USA, 8–11 February 2015, pp. 1–15. http://www.internetsociety.org/doc/integro-leveraging-victim-prediction-robust-fake-account-detection-osns
Canali, D., Bilge, L., Balzarotti, D.: On the effectiveness of risk prediction based on users browsing behavior. In: Proceedings of the 9th ACM Symposium on Information, Computer and Communications Security, ASIA CCS 2014, pp. 171–182. ACM, New York, NY, USA (2014). https://doi.org/10.1145/2590296.2590347, https://doi.acm.org/10.1145/2590296.2590347
Castillo, C., Donato, D., Gionis, A., Murdock, V., Silvestri, F.: Know your neighbors: web spam detection using the web topology. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2007, pp. 423–430. ACM, New York, NY, USA (2007). https://doi.org/10.1145/1277741.1277814, https://doi.acm.org/10.1145/1277741.1277814
Egele, M., Stringhini, G., Kruegel, C., Vigna, G.: COMPA: detecting compromised accounts on social networks. In: Proceedings of the Network & Distributed System Security Symposium, NDSS 2013, ISOC, February 2013
Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15(1), 3133–3181 (2014). http://dl.acm.org/citation.cfm?id=2627435.2697065
Halawa, H., Beznosov, K., Boshmaf, Y., Coskun, B., Ripeanu, M., Santos-Neto, E.: Harvesting the low-hanging fruits: defending against automated large-scale cyber-intrusions by focusing on the vulnerable population. In: Proceedings of the 2016 New Security Paradigms Workshop, NSPW 2016, pp. 11–22. ACM, New York, NY, USA (2016). https://doi.org/10.1145/3011883.3011885, https://doi.acm.org/10.1145/3011883.3011885
Halawa, H., Ripeanu, M., Beznosov, K., Coskun, B., Liu, M.: Forecasting suspicious account activity at large-scale online service providers. CoRR abs/1801.08629 (2018). http://arxiv.org/abs/1801.08629
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009). https://doi.org/10.1109/TKDE.2008.239
Ho, G., Javed, A.S.M., Paxson, V., Wagner, D.: Detecting credential spearphishing attacks in enterprise settings. In: Proceedings of the 26rd USENIX Security Symposium, USENIX Security 2017, pp. 469–485 (2017)
Jagatic, T.N., Johnson, N.A., Jakobsson, M., Menczer, F.: Social phishing. Commun. ACM 50(10), 94–100 (2007)
Liu, G., Xiang, G., Pendleton, B.A., Hong, J.I., Liu, W.: Smartening the crowds: computational techniques for improving human verification to fight phishing scams. In: Proceedings of the Seventh Symposium on Usable Privacy and Security, SOUPS 2011, pp. 8:1–8:13. ACM, New York, NY, USA (2011). https://doi.org/10.1145/2078827.2078838, https://doi.acm.org/10.1145/2078827.2078838
Liu, Y., et al.: Cloudy with a chance of breach: forecasting cyber security incidents. In: Proceedings of the 24th USENIX Security Symposium, USENIX Security 2015, pp. 1009–1024 (2015)
Lomax, S., Vadera, S.: A survey of cost-sensitive decision tree induction algorithms. ACM Comput. Surv. 45(2), 16:1–16:35 (2013). https://doi.org/10.1145/2431211.2431215. https://doi.acm.org/10.1145/2431211.2431215
Ludl, C., McAllister, S., Kirda, E., Kruegel, C.: On the effectiveness of techniques to detect phishing sites. In: M. Hämmerli, B., Sommer, R. (eds.) DIMVA 2007. LNCS, vol. 4579, pp. 20–39. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73614-1_2
Moore, T., Clayton, R., Anderson, R.: The economics of online crime. J. Econ. Perspect. 23(3), 3–20 (2009). https://doi.org/10.1257/jep.23.3.3. https://www.aeaweb.org/articles/?doi=10.1257/jep.23.3.3
Provost, F., Fawcett, T.: Robust classification for imprecise environments. Mach. Learn. 42(3), 203–231 (2001). https://doi.org/10.1023/A:1007601015854. https://dx.doi.org/10.1023/A:1007601015854
Shon, T., Moon, J.: A hybrid machine learning approach to network anomaly detection. Inf. Sci. 177(18), 3799–3821 (2007)
Soska, K., Christin, N.: Automatically detecting vulnerable websites before they turn malicious. In: Proceedings of the 23rd USENIX Security Symposium, USENIX Security 2014, pp. 625–640 (2014)
Stein, T., Chen, E., Mangla, K.: Facebook immune system. In: Proceedings of the 4th Workshop on Social Network Systems, SNS 2011, pp. 8:1–8:8. ACM, New York, NY, USA (2011). https://doi.org/10.1145/1989656.1989664. https://doi.acm.org/10.1145/1989656.1989664
Thomas, K., Li, F., Grier, C., Paxson, V.: Consequences of connectivity: characterizing account hijacking on twitter. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, CCS 2014, pp. 489–500. ACM, New York, NY, USA (2014). https://doi.org/10.1145/2660267.2660282. https://doi.acm.org/10.1145/2660267.2660282
Wang, G., Konolige, T., Wilson, C., Wang, X., Zheng, H., Zhao, B.Y.: You are how you click: clickstream analysis for sybil detection. In: Proceedings of the 22Nd USENIX Conference on Security, SEC 2013, pp. 241–256. USENIX Association, Berkeley, CA, USA (2013). http://dl.acm.org/citation.cfm?id=2534766.2534788
Whittaker, C., Ryner, B., Nazif, M.: Large-scale automatic classification of phishing pages. In: Proceedings of the 17th Annual Network and Distributed System Security Symposium, NDSS Symposium 2010, San Diego, CA, USA (2010)
Yang, Z., Wilson, C., Wang, X., Gao, T., Zhao, B.Y., Dai, Y.: Uncovering social network sybils in the wild. In: Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference, IMC 2011, pp. 259–268. ACM, New York, NY, USA (2011). https://doi.org/10.1145/2068816.2068841. https://doi.acm.org/10.1145/2068816.2068841
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud 2010, p. 10. USENIX Association, Berkeley, CA, USA (2010). http://dl.acm.org/citation.cfm?id=1863103.1863113
Zhang, J., et al.: Safeguarding academic accounts and resources with the university credential abuse auditing system. In: IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012), pp. 1–8, June 2012. https://doi.org/10.1109/DSN.2012.6263961
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 International Financial Cryptography Association
About this paper
Cite this paper
Halawa, H., Beznosov, K., Coskun, B., Liu, M., Ripeanu, M. (2019). Forecasting Suspicious Account Activity at Large-Scale Online Service Providers. In: Goldberg, I., Moore, T. (eds) Financial Cryptography and Data Security. FC 2019. Lecture Notes in Computer Science(), vol 11598. Springer, Cham. https://doi.org/10.1007/978-3-030-32101-7_33
Download citation
DOI: https://doi.org/10.1007/978-3-030-32101-7_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32100-0
Online ISBN: 978-3-030-32101-7
eBook Packages: Computer ScienceComputer Science (R0)