Advertisement

Security of Data Science and Data Science for Security

  • Bernhard TellenbachEmail author
  • Marc Rennhard
  • Remo Schweizer
Chapter

Abstract

In this chapter, we present a brief overview of important topics regarding the connection of data science and security. In the first part, we focus on the security of data science and discuss a selection of security aspects that data scientists should consider to make their services and products more secure. In the second part about security for data science, we switch sides and present some applications where data science plays a critical role in pushing the state-of-the-art in securing information systems. This includes a detailed look at the potential and challenges of applying machine learning to the problem of detecting obfuscated JavaScripts.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amazon Web Services. (2017). Artificial intelligence on AWS. Retrieved from https://aws.amazon.com/amazon-ai/
  2. Arp, D., Spreitzenbarth, M., Gascon, H., & Rieck, K. (2014). DREBIN: Effective and explainable detection of android malware in your pocket. Presented at the 21st Annual Network and Distributed System Security Symposium (NDSS). Retrieved from http://dblp.uni-trier.de/db/conf/ndss/ndss2014.html#ArpSHGR14
  3. Bösch, C. T., Hartel, P. H., Jonker, W., & Peter, A. (2014). A survey of provably secure searchable encryption. ACM Computing Surveys, 47(2), 18:1–18:51.CrossRefGoogle Scholar
  4. Branco, P., Torgo, L., & Ribeiro, R. P. (2016). A survey of predictive modeling on imbalanced domains. ACM Computing Surveys, 49(2), 31:1–31:50.  https://doi.org/10.1145/2907070.CrossRefGoogle Scholar
  5. Brown, B. (2017). How to make fully homomorphic encryption “practical and usable”. NETWORKWORLD. Retrieved from https://www.networkworld.com/article/3196121/security/how-to-make-fully-homomorphic-encryption-practical-and-usable.html
  6. Burgees, M., & Temperton, J. (2016). The security flaws at the heart of the Panama Papers. WIRED Magazine. Retrieved from http://www.wired.co.uk/article/panama-papers-mossack-fonseca-website-security-problems
  7. Center for Internet Security. (2017). CIS 20 security controls. Retrieved from https://www.cisecurity.org/controls
  8. Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, 41(3), 15:1–15:58.CrossRefGoogle Scholar
  9. Check Point. (2016). Check Point research shows surge in active malware families during first half of 2016. Retrieved from https://www.checkpoint.com/press/2016/check-point-research-shows-surge-active-malware-families-first-half-2016/
  10. Cloud Security Alliance. (2013a). Big data analytics for security intelligence. Retrieved from https://downloads.cloudsecurityalliance.org/initiatives/bdwg/Big_Data_Analytics_for_Security_Intelligence.pdf
  11. Cloud Security Alliance. (2013b). Expanded top ten big data security and privacy challenges. Retrieved from https://downloads.cloudsecurityalliance.org/initiatives/bdwg/Expanded_Top_Ten_Big_Data_Security_and_Privacy_Challenges.pdf
  12. Cloud Security Alliance. (2016). Security and privacy handbook: 100 best practices in big data security and privacy. Retrieved from https://downloads.cloudsecurityalliance.org/assets/research/big-data/BigData_Security_and_Privacy_Handbook.pdf
  13. Cox, L. H. (1980). Suppression methodology and statistical disclosure control. Journal of the American Statistical Association, 75(370), 377–385.CrossRefGoogle Scholar
  14. Cylance. (2017). Math vs. Malware (white paper). Retrieved from https://www.cylance.com/content/dam/cylance/pdfs/white_papers/MathvsMalware.pdf
  15. Dark Reading. (2012). A case study in security big data analysis. Retrieved from https://www.darkreading.com/analytics/security-monitoring/a-case-study-in-security-big-data-analysis/d/d-id/1137299
  16. Dwork, C. (2006). Differential privacy. In Proceedings of the 33rd International Conference on Automata, Languages and Programming – Volume Part II (pp. 1–12). Berlin: Springer.Google Scholar
  17. Easttom, W., II. (2016). Computer security fundamentals. Pearson Education.Google Scholar
  18. Estopace, E. (2016). Massive data breach exposes all Philippines voters. Retrieved from https://www.telecomasia.net/content/massive-data-breach-exposes-all-philippines-voters
  19. Foster, P. (2013). “Bogus” AP tweet about explosion at the White House wipes billions off US markets. The Telegraph. Retrieved from http://www.telegraph.co.uk/finance/markets/10013768/Bogus-AP-tweet-about-explosion-at-the-White-House-wipes-billions-off-US-markets.html
  20. Fredrikson, M., Jha, S., & Ristenpart, T. (2015). Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22Nd ACM SIGSAC Conference on Computer and Communications Security (pp. 1322–1333). New York, NY: ACM.Google Scholar
  21. Fung, B. C. M., Wang, K., Chen, R., & Yu, P. S. (2010). Privacy-preserving data publishing: A survey of recent developments. ACM Computing Surveys, 42(4), 14:1–14:53.CrossRefGoogle Scholar
  22. G DATA. (2016). New ransomware threatens Android devices. Retrieved from https://www.gdatasoftware.com/news/2016/07/28925-new-ransomware-threatens-android-devices
  23. Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., & Herrera, F. (2012). A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(4), 463–484.CrossRefGoogle Scholar
  24. Gentry, C. (2009). Fully homomorphic encryption using ideal lattices. In Proceedings of the Forty-first Annual ACM Symposium on Theory of Computing (pp. 169–178). New York, NY: ACM.Google Scholar
  25. Gentry, C., Halevi, S., & Smart, N. P. (2012). Homomorphic evaluation of the AES circuit. In R. Safavi-Naini, & R. Canetti (Eds.), Advances in Cryptology – CRYPTO 2012: 32nd Annual Cryptology Conference, Santa Barbara, CA, USA, August 19–23, 2012. Proceedings (pp. 850–867). Berlin: Springer.Google Scholar
  26. Google. (2017). Google cloud machine learning engine. Retrieved from https://cloud.google.com/ml-engine/
  27. Gulenko, A., Wallschläger, M., Schmidt, F., Kao, O., & Liu, F. (2016). Evaluating machine learning algorithms for anomaly detection in clouds. In 2016 IEEE International Conference on Big Data (Big Data) (pp. 2716–2721). doi: https://doi.org/10.1109/BigData.2016.7840917
  28. Halevi, S. (2017). HElib: An implementation of homomorphic encryption. Retrieved from https://github.com/shaih/HElib
  29. Holmes, A. (2007). Your guide to good-enough compliance. CIO. Retrieved from https://www.cio.com/article/2439324/risk-management/your-guide-to-good-enough-compliance.html
  30. IBM. (2013). Extending security intelligence with big data solutions. IBM. Retrieved from https://www-01.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=WGW03020USEN
  31. IBM. (2016). IBM announces new national cyber security centre in Canberra. Retrieved from http://www-03.ibm.com/press/au/en/pressrelease/50069.wss
  32. Kirilenko, A., Kyle, A. S., Samadi, M., & Tuzun, T. (2017). The Flash Crash: High-frequency trading in an electronic market. The Journal of Finance, 72(3), 967–998.CrossRefGoogle Scholar
  33. Krawczyk, B. (2016). Learning from imbalanced data: Open challenges and future directions. Progress in Artificial Intelligence, 5(4), 221–232.CrossRefGoogle Scholar
  34. Krebs, B. (2009). Payment processor breach may be largest ever. Retrieved from http://voices.washingtonpost.com/securityfix/2009/01/payment_processor_breach_may_b.html?hpid=topnews
  35. Krebs, B. (2012). Global Payments breach window expands. Retrieved from https://krebsonsecurity.com/2012/05/global-payments-breach-window-expands/
  36. Krebs, B. (2017). Crimepack: Packed with hard lessons. Retrieved from https://krebsonsecurity.com/2010/08/crimepack-packed-with-hard-lessons/
  37. Li, X., & Xue, Y. (2014). A survey on server-side approaches to securing web applications. ACM Computing Surveys, 46(4), 54:1–54:29.CrossRefGoogle Scholar
  38. Mahmud, M. S., Meesad, P., & Sodsee, S. (2016). An evaluation of computational intelligence in credit card fraud detection. In 2016 International Computer Science and Engineering Conference (ICSEC) (pp. 1–6). doi: https://doi.org/10.1109/ICSEC.2016.7859947
  39. Matherly, J. (2015). It’s the data, stupid! Retrieved from https://blog.shodan.io/its-the-data-stupid/
  40. Microsoft. (2015). Microsoft malware classification challenge. Retrieved from https://www.kaggle.com/c/malware-classification
  41. Microsoft. (2017a). Azure machine learning studio. Retrieved from https://azure.microsoft.com/en-us/services/machine-learning-studio/
  42. Microsoft. (2017b). Microsoft security development lifecycle. Retrieved from https://www.microsoft.com/en-us/sdl/
  43. Moreno, J., Serrano, M. A., & Fernández-Medina, E. (2016). Main issues in big data security. Future Internet, 8(3), 44.CrossRefGoogle Scholar
  44. Mozaffari-Kermani, M., Sur-Kolay, S., Raghunathan, A., & Jha, N. K. (2015). Systematic poisoning attacks on and defenses for machine learning in healthcare. IEEE Journal of Biomedical and Health Informatics, 19(6), 1893–1905.CrossRefGoogle Scholar
  45. Narayanan, A., & Shmatikov, V. (2008). Robust de-anonymization of large sparse datasets. In Proceedings of the 2008 IEEE Symposium on Security and Privacy (pp. 111–125). Washington, DC: IEEE Computer Society.CrossRefGoogle Scholar
  46. Narayanan, B. N., Djaneye-Boundjou, O., & Kebede, T. M. (2016). Performance analysis of machine learning and pattern recognition algorithms for malware classification. In 2016 IEEE National Aerospace and Electronics Conference (NAECON) and Ohio Innovation Summit (OIS) (pp. 338–342). doi: https://doi.org/10.1109/NAECON.2016.7856826
  47. National Institute of Standards and Technology. (2017). NIST Special Publication Series SP 800 and SP 1800.Google Scholar
  48. Nelson, B., Barreno, M., Chi, F.J., Joseph, A.D., Rubinstein, B.I.P., Saini, U., Sutton, C., Tygar, J.D., Xia, K. (2008). Exploiting machine learning to subvert your spam filter. In: Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats, LEET’08 (pp. 7:1–7:9). USENIX Association, Berkeley, CA.Google Scholar
  49. OWASP. (2017a). OWASP SAMM project. Retrieved from https://www.owasp.org/index.php/OWASP_SAMM_Project
  50. OWASP. (2017b). OWASP Top Ten project. Retrieved from https://www.owasp.org/index.php/Category:OWASP_Top_Ten_Project
  51. Pauli, D. (2017a). MongoDB hackers now sacking Elasticsearch. The Register. Retrieved from https://www.theregister.co.uk/2017/01/13/elasticsearch_mongodb/
  52. Pauli, D. (2017b). MongoDB ransom attacks soar, body count hits 27,000 in hours. The Register. Retrieved from http://www.theregister.co.uk/2017/01/09/mongodb/
  53. Pimentel, M. A. F., Clifton, D. A., Clifton, L., & Tarassenko, L. (2014). A review of novelty detection. Signal Process, 99, 215–249.  https://doi.org/10.1016/j.sigpro.2013.12.026.CrossRefGoogle Scholar
  54. Popa, R. A., Redfield, C. M. S., Zeldovich, N., & Balakrishnan, H. (2011). CryptDB: Protecting confidentiality with encrypted query processing. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (pp. 85–100). New York, NY: ACM.  https://doi.org/10.1145/2043556.2043566.CrossRefGoogle Scholar
  55. Pozzolo, A. D., Caelen, O., Johnson, R. A., & Bontempi, G. (2015). Calibrating probability with undersampling for unbalanced classification. In 2015 IEEE Symposium Series on Computational Intelligence (pp. 159–166).  https://doi.org/10.1109/SSCI.2015.33.CrossRefGoogle Scholar
  56. Prasanna, B. T., & Akki, C. B. (2015). A comparative study of homomorphic and searchable encryption schemes for cloud computing. CoRR, abs/1505.03263. Retrieved from http://arxiv.org/abs/1505.03263
  57. Risk Based Security. (2014). A breakdown and analysis of the December, 2014 Sony Hack. Retrieved from https://www.riskbasedsecurity.com/2014/12/a-breakdown-and-analysis-of-the-december-2014-sony-hack
  58. Ron, A., Shulman-Peleg, A., & Puzanov, A. (2016). Analysis and mitigation of NoSQL injections. IEEE Security and Privacy, 14, 30–39.CrossRefGoogle Scholar
  59. Sahafizadeh, E., & Nematbakhsh, M. A. (2015). A survey on security issues in big data and NoSQL. Advances in Computer Science: An International Journal, 4(4), 68–72.Google Scholar
  60. Samarati, P., & Sweeney, L. (1998). Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Computer Science Laboratory, SRI International. Retrieved from http://www.csl.sri.com/papers/sritr-98-04/
  61. Schreier, J. (2011). Sony estimates $171 million loss from PSN hack. WIRED Magazine. Retrieved from https://www.wired.com/2011/05/sony-psn-hack-losses/
  62. Selvi, U., & Pushpa, S. (2015). A review of big data and anonymization algorithms. International Journal of Applied Engineering Research, 10(17).Google Scholar
  63. Shiravi, A., Shiravi, H., Tavallaee, M., & Ghorbani, A. A. (2012). Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Computers and Security, 31(3), 357–374.CrossRefGoogle Scholar
  64. Software Testing Help. (2017). 37 most powerful penetration testing tools (security testing tools). Retrieved from http://www.softwaretestinghelp.com/penetration-testing-tools/
  65. Soheily-Khah, S., Marteau, P.-F., & Béchet, N. (2017). Intrusion detection in network systems through hybrid supervised and unsupervised mining process – A detailed case study on the ISCX benchmark dataset. Retrieved from https://hal.archives-ouvertes.fr/hal-01521007
  66. Song, D. X., Wagner, D., & Perrig, A. (2000). Practical techniques for searches on encrypted data. In Proceedings of the 2000 IEEE Symposium on Security and Privacy (pp. 44–). Washington, DC: IEEE Computer Society. Retrieved from http://dl.acm.org/citation.cfm?id=882494.884426
  67. Sweeney, L. (1997). Guaranteeing anonymity when sharing medical data, the Datafly System. Proceedings: A Conference of the American Medical Informatics Association. AMIA Fall Symposium (pp. 51–55).Google Scholar
  68. Sweeney, L., Abu, A., & Winn, J. (2013). Identifying participants in the Personal Genome project by name (a re-identification experiment). CoRR, abs/1304.7605. Retrieved from http://arxiv.org/abs/1304.7605
  69. Tam, K., Feizollah, A., Anuar, N. B., Salleh, R., & Cavallaro, L. (2017). The evolution of Android malware and Android analysis techniques. ACM Computing Surveys, 49(4), 76:1–76:41.  https://doi.org/10.1145/3017427.CrossRefGoogle Scholar
  70. Tellenbach, B., Paganoni, S., & Rennhard, M. (2016). Detecting obfuscated JavaScripts from known and unknown obfuscators using machine learning. International Journal on Advances in Security, 9(3&4). Retrieved from https://www.thinkmind.org/download.php?articleid=sec_v9_n34_2016_10.
  71. Tramèr, F., Zhang, F., Juels, A., Reiter, M. K., & Ristenpart, T. (2016). Stealing machine learning models via prediction APIs. CoRR, abs/1609.02943. Retrieved from http://arxiv.org/abs/1609.02943
  72. Viegas, E. K., Santin, A. O., & Oliveira, L. S. (2017). Toward a reliable anomaly-based intrusion detection in real-world environments. Computer Networks, 127(Suppl. C), 200–216.  https://doi.org/10.1016/j.comnet.2017.08.013.CrossRefGoogle Scholar
  73. Yen, T.-F., Oprea, A., Onarlioglu, K., Leetham, T., Robertson, W., Juels, A., & Kirda, E. (2013). Beehive: Large-scale log analysis for detecting suspicious activity in enterprise networks. In Proceedings of the 29th Annual Computer Security Applications Conference (pp. 199–208). New York, NY: ACM.  https://doi.org/10.1145/2523649.2523670.CrossRefGoogle Scholar
  74. Zuech, R., Khoshgoftaar, T. M., & Wald, R. (2015). Intrusion detection and big heterogeneous data: A survey. Journal of Big Data, 2(1), 3.CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Bernhard Tellenbach
    • 1
    Email author
  • Marc Rennhard
    • 1
  • Remo Schweizer
    • 1
  1. 1.ZHAW Zurich University of Applied SciencesWinterthurSwitzerland

Personalised recommendations