Security of Data Science and Data Science for Security

  • Bernhard TellenbachEmail author
  • Marc Rennhard
  • Remo Schweizer


In this chapter, we present a brief overview of important topics regarding the connection of data science and security. In the first part, we focus on the security of data science and discuss a selection of security aspects that data scientists should consider to make their services and products more secure. In the second part about security for data science, we switch sides and present some applications where data science plays a critical role in pushing the state-of-the-art in securing information systems. This includes a detailed look at the potential and challenges of applying machine learning to the problem of detecting obfuscated JavaScripts.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Amazon Web Services. (2017). Artificial intelligence on AWS. Retrieved from
  2. Arp, D., Spreitzenbarth, M., Gascon, H., & Rieck, K. (2014). DREBIN: Effective and explainable detection of android malware in your pocket. Presented at the 21st Annual Network and Distributed System Security Symposium (NDSS). Retrieved from
  3. Bösch, C. T., Hartel, P. H., Jonker, W., & Peter, A. (2014). A survey of provably secure searchable encryption. ACM Computing Surveys, 47(2), 18:1–18:51.CrossRefGoogle Scholar
  4. Branco, P., Torgo, L., & Ribeiro, R. P. (2016). A survey of predictive modeling on imbalanced domains. ACM Computing Surveys, 49(2), 31:1–31:50. Scholar
  5. Brown, B. (2017). How to make fully homomorphic encryption “practical and usable”. NETWORKWORLD. Retrieved from
  6. Burgees, M., & Temperton, J. (2016). The security flaws at the heart of the Panama Papers. WIRED Magazine. Retrieved from
  7. Center for Internet Security. (2017). CIS 20 security controls. Retrieved from
  8. Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, 41(3), 15:1–15:58.CrossRefGoogle Scholar
  9. Check Point. (2016). Check Point research shows surge in active malware families during first half of 2016. Retrieved from
  10. Cloud Security Alliance. (2013a). Big data analytics for security intelligence. Retrieved from
  11. Cloud Security Alliance. (2013b). Expanded top ten big data security and privacy challenges. Retrieved from
  12. Cloud Security Alliance. (2016). Security and privacy handbook: 100 best practices in big data security and privacy. Retrieved from
  13. Cox, L. H. (1980). Suppression methodology and statistical disclosure control. Journal of the American Statistical Association, 75(370), 377–385.CrossRefGoogle Scholar
  14. Cylance. (2017). Math vs. Malware (white paper). Retrieved from
  15. Dark Reading. (2012). A case study in security big data analysis. Retrieved from
  16. Dwork, C. (2006). Differential privacy. In Proceedings of the 33rd International Conference on Automata, Languages and Programming – Volume Part II (pp. 1–12). Berlin: Springer.Google Scholar
  17. Easttom, W., II. (2016). Computer security fundamentals. Pearson Education.Google Scholar
  18. Estopace, E. (2016). Massive data breach exposes all Philippines voters. Retrieved from
  19. Foster, P. (2013). “Bogus” AP tweet about explosion at the White House wipes billions off US markets. The Telegraph. Retrieved from
  20. Fredrikson, M., Jha, S., & Ristenpart, T. (2015). Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22Nd ACM SIGSAC Conference on Computer and Communications Security (pp. 1322–1333). New York, NY: ACM.Google Scholar
  21. Fung, B. C. M., Wang, K., Chen, R., & Yu, P. S. (2010). Privacy-preserving data publishing: A survey of recent developments. ACM Computing Surveys, 42(4), 14:1–14:53.CrossRefGoogle Scholar
  22. G DATA. (2016). New ransomware threatens Android devices. Retrieved from
  23. Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., & Herrera, F. (2012). A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(4), 463–484.CrossRefGoogle Scholar
  24. Gentry, C. (2009). Fully homomorphic encryption using ideal lattices. In Proceedings of the Forty-first Annual ACM Symposium on Theory of Computing (pp. 169–178). New York, NY: ACM.Google Scholar
  25. Gentry, C., Halevi, S., & Smart, N. P. (2012). Homomorphic evaluation of the AES circuit. In R. Safavi-Naini, & R. Canetti (Eds.), Advances in Cryptology – CRYPTO 2012: 32nd Annual Cryptology Conference, Santa Barbara, CA, USA, August 19–23, 2012. Proceedings (pp. 850–867). Berlin: Springer.Google Scholar
  26. Google. (2017). Google cloud machine learning engine. Retrieved from
  27. Gulenko, A., Wallschläger, M., Schmidt, F., Kao, O., & Liu, F. (2016). Evaluating machine learning algorithms for anomaly detection in clouds. In 2016 IEEE International Conference on Big Data (Big Data) (pp. 2716–2721). doi:
  28. Halevi, S. (2017). HElib: An implementation of homomorphic encryption. Retrieved from
  29. Holmes, A. (2007). Your guide to good-enough compliance. CIO. Retrieved from
  30. IBM. (2013). Extending security intelligence with big data solutions. IBM. Retrieved from
  31. IBM. (2016). IBM announces new national cyber security centre in Canberra. Retrieved from
  32. Kirilenko, A., Kyle, A. S., Samadi, M., & Tuzun, T. (2017). The Flash Crash: High-frequency trading in an electronic market. The Journal of Finance, 72(3), 967–998.CrossRefGoogle Scholar
  33. Krawczyk, B. (2016). Learning from imbalanced data: Open challenges and future directions. Progress in Artificial Intelligence, 5(4), 221–232.CrossRefGoogle Scholar
  34. Krebs, B. (2009). Payment processor breach may be largest ever. Retrieved from
  35. Krebs, B. (2012). Global Payments breach window expands. Retrieved from
  36. Krebs, B. (2017). Crimepack: Packed with hard lessons. Retrieved from
  37. Li, X., & Xue, Y. (2014). A survey on server-side approaches to securing web applications. ACM Computing Surveys, 46(4), 54:1–54:29.CrossRefGoogle Scholar
  38. Mahmud, M. S., Meesad, P., & Sodsee, S. (2016). An evaluation of computational intelligence in credit card fraud detection. In 2016 International Computer Science and Engineering Conference (ICSEC) (pp. 1–6). doi:
  39. Matherly, J. (2015). It’s the data, stupid! Retrieved from
  40. Microsoft. (2015). Microsoft malware classification challenge. Retrieved from
  41. Microsoft. (2017a). Azure machine learning studio. Retrieved from
  42. Microsoft. (2017b). Microsoft security development lifecycle. Retrieved from
  43. Moreno, J., Serrano, M. A., & Fernández-Medina, E. (2016). Main issues in big data security. Future Internet, 8(3), 44.CrossRefGoogle Scholar
  44. Mozaffari-Kermani, M., Sur-Kolay, S., Raghunathan, A., & Jha, N. K. (2015). Systematic poisoning attacks on and defenses for machine learning in healthcare. IEEE Journal of Biomedical and Health Informatics, 19(6), 1893–1905.CrossRefGoogle Scholar
  45. Narayanan, A., & Shmatikov, V. (2008). Robust de-anonymization of large sparse datasets. In Proceedings of the 2008 IEEE Symposium on Security and Privacy (pp. 111–125). Washington, DC: IEEE Computer Society.CrossRefGoogle Scholar
  46. Narayanan, B. N., Djaneye-Boundjou, O., & Kebede, T. M. (2016). Performance analysis of machine learning and pattern recognition algorithms for malware classification. In 2016 IEEE National Aerospace and Electronics Conference (NAECON) and Ohio Innovation Summit (OIS) (pp. 338–342). doi:
  47. National Institute of Standards and Technology. (2017). NIST Special Publication Series SP 800 and SP 1800.Google Scholar
  48. Nelson, B., Barreno, M., Chi, F.J., Joseph, A.D., Rubinstein, B.I.P., Saini, U., Sutton, C., Tygar, J.D., Xia, K. (2008). Exploiting machine learning to subvert your spam filter. In: Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats, LEET’08 (pp. 7:1–7:9). USENIX Association, Berkeley, CA.Google Scholar
  49. OWASP. (2017a). OWASP SAMM project. Retrieved from
  50. OWASP. (2017b). OWASP Top Ten project. Retrieved from
  51. Pauli, D. (2017a). MongoDB hackers now sacking Elasticsearch. The Register. Retrieved from
  52. Pauli, D. (2017b). MongoDB ransom attacks soar, body count hits 27,000 in hours. The Register. Retrieved from
  53. Pimentel, M. A. F., Clifton, D. A., Clifton, L., & Tarassenko, L. (2014). A review of novelty detection. Signal Process, 99, 215–249. Scholar
  54. Popa, R. A., Redfield, C. M. S., Zeldovich, N., & Balakrishnan, H. (2011). CryptDB: Protecting confidentiality with encrypted query processing. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (pp. 85–100). New York, NY: ACM. Scholar
  55. Pozzolo, A. D., Caelen, O., Johnson, R. A., & Bontempi, G. (2015). Calibrating probability with undersampling for unbalanced classification. In 2015 IEEE Symposium Series on Computational Intelligence (pp. 159–166). Scholar
  56. Prasanna, B. T., & Akki, C. B. (2015). A comparative study of homomorphic and searchable encryption schemes for cloud computing. CoRR, abs/1505.03263. Retrieved from
  57. Risk Based Security. (2014). A breakdown and analysis of the December, 2014 Sony Hack. Retrieved from
  58. Ron, A., Shulman-Peleg, A., & Puzanov, A. (2016). Analysis and mitigation of NoSQL injections. IEEE Security and Privacy, 14, 30–39.CrossRefGoogle Scholar
  59. Sahafizadeh, E., & Nematbakhsh, M. A. (2015). A survey on security issues in big data and NoSQL. Advances in Computer Science: An International Journal, 4(4), 68–72.Google Scholar
  60. Samarati, P., & Sweeney, L. (1998). Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Computer Science Laboratory, SRI International. Retrieved from
  61. Schreier, J. (2011). Sony estimates $171 million loss from PSN hack. WIRED Magazine. Retrieved from
  62. Selvi, U., & Pushpa, S. (2015). A review of big data and anonymization algorithms. International Journal of Applied Engineering Research, 10(17).Google Scholar
  63. Shiravi, A., Shiravi, H., Tavallaee, M., & Ghorbani, A. A. (2012). Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Computers and Security, 31(3), 357–374.CrossRefGoogle Scholar
  64. Software Testing Help. (2017). 37 most powerful penetration testing tools (security testing tools). Retrieved from
  65. Soheily-Khah, S., Marteau, P.-F., & Béchet, N. (2017). Intrusion detection in network systems through hybrid supervised and unsupervised mining process – A detailed case study on the ISCX benchmark dataset. Retrieved from
  66. Song, D. X., Wagner, D., & Perrig, A. (2000). Practical techniques for searches on encrypted data. In Proceedings of the 2000 IEEE Symposium on Security and Privacy (pp. 44–). Washington, DC: IEEE Computer Society. Retrieved from
  67. Sweeney, L. (1997). Guaranteeing anonymity when sharing medical data, the Datafly System. Proceedings: A Conference of the American Medical Informatics Association. AMIA Fall Symposium (pp. 51–55).Google Scholar
  68. Sweeney, L., Abu, A., & Winn, J. (2013). Identifying participants in the Personal Genome project by name (a re-identification experiment). CoRR, abs/1304.7605. Retrieved from
  69. Tam, K., Feizollah, A., Anuar, N. B., Salleh, R., & Cavallaro, L. (2017). The evolution of Android malware and Android analysis techniques. ACM Computing Surveys, 49(4), 76:1–76:41. Scholar
  70. Tellenbach, B., Paganoni, S., & Rennhard, M. (2016). Detecting obfuscated JavaScripts from known and unknown obfuscators using machine learning. International Journal on Advances in Security, 9(3&4). Retrieved from
  71. Tramèr, F., Zhang, F., Juels, A., Reiter, M. K., & Ristenpart, T. (2016). Stealing machine learning models via prediction APIs. CoRR, abs/1609.02943. Retrieved from
  72. Viegas, E. K., Santin, A. O., & Oliveira, L. S. (2017). Toward a reliable anomaly-based intrusion detection in real-world environments. Computer Networks, 127(Suppl. C), 200–216. Scholar
  73. Yen, T.-F., Oprea, A., Onarlioglu, K., Leetham, T., Robertson, W., Juels, A., & Kirda, E. (2013). Beehive: Large-scale log analysis for detecting suspicious activity in enterprise networks. In Proceedings of the 29th Annual Computer Security Applications Conference (pp. 199–208). New York, NY: ACM. Scholar
  74. Zuech, R., Khoshgoftaar, T. M., & Wald, R. (2015). Intrusion detection and big heterogeneous data: A survey. Journal of Big Data, 2(1), 3.CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Bernhard Tellenbach
    • 1
    Email author
  • Marc Rennhard
    • 1
  • Remo Schweizer
    • 1
  1. 1.ZHAW Zurich University of Applied SciencesWinterthurSwitzerland

Personalised recommendations