Developing Security Intelligence in Big Data
In today’s world, as the volume of digitized data grows exponentially, the need and the ability to store and computationally analyze large datasets are growing along with it. The term “big data” refers to very large or complex datasets, such that classical data processing software applications are insufficient to manage. A great example of a company that symbolizes the modern mass data-driven world is Google. It is possibly the most successful IT company in the world as well as the largest data processing company of modern times. In April 2004, Larry Page and Sergey Brin wrote their first and now famous “Founders Letter” to their employees which stated “Google is not a conventional company. We do not intend to become one.” Twelve years down the line, with a change in leadership, incoming CEO Sundar Pichai wrote a letter to employees in 2016 and concluded it with “Google is an information company. It was when it was founded, and it is today. And it’s what people do with that information that amazes and inspires me every day.” There are many challenges in the analysis of large volumes of data, including data capture and storage, data analysis, curation, searching, sharing and transfer-ring, data visualization, data inquiry and updating, among others. However, the biggest challenge is information security and privacy of big data . A lack of securi-ty around big data can lead to great financial losses and damage to the reputation for the company. Security threats and attacks are becoming more active in violating cyber rules and regulations. These attacks also affect big data and the information contained in it. Attackers target personal and financial data, or a company’s confidential intellectual property information, which greatly affects their competitiveness. The biggest threat is when attackers target personal or consumer financial information stored in big data. Although there are rules and regulations in place to protect data, there are still vulnerabilities in big data that are serious enough to warrant substantial concern. In a recent and highly publicized incident, WikiLeaks released a huge trove of alleged internal documents from the US Central Intelligence Agency (CIA). It is by far the largest leak of CIA documents in history. There are thousands of pages describing sophisticated software tools and techniques used by the agency to break into smartphones, computers, and even Internet-connected televisions. Both government and corporate leaks have been made possible due to the ease of downloading, storing, and transferring millions of documents in a very short time. With this state of affairs in mind, there needs to be a comprehensive examination of these threats and attacks on big data, and a study of novel approaches to defend it. This chapter presents an in-depth look into the threats and attacks on big data and inspects the methods of defense and protection. We discuss the vulnerabilities of modern big data systems, and the characteristic methods of intrusion, and unauthorized seizure of data. We present a few case studies of big data weaknesses and their exploitation by attackers. The information offered here is very useful in building proper defenses against potential malicious incidents. We also discuss the specific security demands of big data environments in government and medical sectors.
KeywordsBig data Security threats Cryptography Digital footprints Machine learning
- 1.Alshboul, Y., Wang, Y., & Nepali, R. K. (2015). Big data lifecycle: Threats and security model. In 2015, Twenty-first Americas Conference on Information Systems, Puerto Rico 2015.Google Scholar
- 2.Bisk. (2017). “What is big data?” Business intelligence by Villanova University. Retrieved on May 22, 2017.Google Scholar
- 3.Boyd, D., & Crawford, K. (2011). Six provocations for big data. In Social Science Research Network: A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society. https://doi.org/10.2139/ssrn.1926431.2011.
- 4.Community cleverness required. (2008). Nature, 455(7209), 1. https://doi.org/10.1038/455001a. PMID 18769385.2008.
- 5.Dan, S. (2016) Comparing the top big data security analytics tools. At http://searchsecurity.techtarget.com/feature/Comparing-the-top-big-data-security-analytics-tools. Accessed on May 16, 2017.
- 6.Dev, H., Sen, T., Basak, M., & Ali, M. E. (2012). An approach to protect the privacy of cloud data from data mining based attacks. In Proceeding of High Performance Computing, Networking Storage and Analysis, IEEE, November, (pp. 1106–1115). https://doi.org/10.1109/SC.Companion.2012.133.
- 7.GRR Rapid Response at https://github.com/google/grr. Accessed on May 22, 2017.
- 8.Grimes, S. (2017). Big data: Avoid ‘Wanna V’ Confusion. InformationWeek. Retrieved May 25, 2017.Google Scholar
- 9.Hilbert, M., López, P. (2011). The World’s Technological Capacity to Store, Communicate, and Compute Information”. Science, 332(6025), 60–65. https://doi.org/10.1126/science.1200970. PMID 21310967.
- 10.Jensen, M. (2013). Challenges of privacy protection in big data analytics. In Proceeding of the International Congress on Big Data, IEEE, June, (pp. 235–238). https://doi.org/10.1109/BigData.Congress.2013.39.
- 11.Jitendra, C. (2014). Top 5 big data vulnerability classes. At http://www.cisoplatform.com/profiles/blogs/top-5-big-data-vulnerability-classes-1. Accessed on 12, 2017.
- 12.Kantarcioglu, M. (2017). Securing ‘big’ data. At http://www.utdallas.edu/~muratk/research-summary.pdf. Accessed on May 18, 2017.
- 13.Kim, S.-H., Eom, J.-H., & Chung, T.-M. (2013). Big data security hardening methodology using attributes relationship. In 2013 International Conference on Information Science and Applications (ICISA), IEEE, June, (pp. 1–2). https://doi.org/10.1109/ICISA.2013.6579427.
- 14.Mac Intrusion Detection Analysis System (MIDAS). Available at https://github.com/etsy/MIDAS. Accessed on May 26.
- 15.Mayer-Schönberger, V., & Cukier, K. (2013). Big data: A revolution that will transform how we live, work and think. London: John Murray.Google Scholar
- 16.MeriTalk. (2015). Survey analysis, How Government IT Can Counter Security Threats By Analyzing Big Data. At https://www.splunk.com/content/dam/splunk2/pdfs/white-papers/how-government-it-can-counter-security-threats-by-analyzing-big-data.pdf.
- 17.OSSEC—Open Source HIDS Security at http://www.ossec.net/ Accessed on May 22, 2017.
- 18.OpenSSH. (2017). At http://www.openssh.com/. Accessed on May 24, 2017.
- 19.OpenVAS. (2017). At http://openvas.org/. Accessed on May 25, 2017.
- 20.Oracle and FSN. (2017). Mastering big data: CFO strategies to transform insight into opportunity. December 2012.Google Scholar
- 21.Reichman, O. J., Jones, M. B., Schildhauer, M. P. (2011). Challenges and opportunities of open data in ecology. Science. 331(6018), 703–705. https://doi.org/10.1126/science.1197962. PMID 21311007.2011.
- 22.Rob, M. (2017). 10 best practices for securing big data. At http://in.pcmag.com/feature/107583/10-best-practices-for-securing-big-data. Accessed on May 12, 2017.
- 23.Security at http://blog.securityonion.net/p/securityonion.html. Accessed on May 27, 2017.
- 24.Segaran, T., Hammerbacher, J. (2009). Beautiful data: The stories behind elegant data solutions (p. 257). O’Reilly Media. ISBN 978-0-596-15711-1.2009.Google Scholar
- 25.Shane, S., Rosenberg, M., & Lehren, A. W. (2017). WikiLeaks releases trove of alleged C.I.A. hacking documents. New York Times March 7, 2017.Google Scholar
- 26.Skyline anomaly detection system at https://github.com/etsy/skyline. Accessed on May 28, 2017.
- 27.The Economist Newspaper. (2010, February 25). Data, data everywhere. Accessed on May 28, 2017.Google Scholar
- 28.What is big data?—Bringing big data to the enterprise. www.ibm.com. Retrieved May 20, 2017.
- 29.Wu, C., & Guo, Y. (2013). Enhanced user data privacy with pay-by-data model. In Proceeding of the International Conference of Big Data, IEEE, Ieee, October, pp. 53–57. https://doi.org/10.1109/BigData.2013.6691688.
- 30.Xu, L., Jiang, C., Wang, J., Yuan, J., & Ren, Y. (2014). Information security in big data: Privacy and data mining. The Journal for Rapid Open Access Publishing, 2, 1149–1176. https://doi.org/10.1109/ACCESS.2014.2362522.