Skip to main content

Security Theories and Practices for Big Data

  • Chapter
  • First Online:
Big Data Concepts, Theories, and Applications

Abstract

Big data applications usually require flexible and scalable infrastructure for efficient processing. Cloud computing satisfies these requirements very well and has been widely adopted to provide big data services. However, outsourcing and resource sharing features of cloud computing lead to security concerns when applied to big data applications, e.g., confidentiality of data/program, and integrity of the processing procedure. On the other hand, when cloud owns the data and provides analytic service, data privacy also becomes a challenge. Security concerns and pressing demand for adopting big data technology together motivate the development of a special class of security technologies for safe big data processing in cloud environment. These approaches are roughly divided into two categories: designing new algorithms with unique security features and developing security enhanced systems to protect big data applications. In this chapter, we review the approaches for secure big data processing from both categories, evaluate and compare these technologies from different perspectives, and present a general outlook on the current state of research and development in the field of security theories for big data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aggarwal CC, Philip SY (2008) A general survey of privacy-preserving data mining models and algorithms. Springer, Berlin

    Book  Google Scholar 

  2. Aggarwal G, Bawa M, Ganesan P, Garcia-Molina H, Kenthapadi K, Motwani R, Srivastava U, Thomas D, Xu Y (2005) Two can keep a secret: a distributed architecture for secure database services. In: Second biennial conference on innovative data systems research - CIDR 2005, pp 186–199

    Google Scholar 

  3. Agrawal R, Kiernan J, Srikant R, Xu Y (2004) Order preserving encryption for numeric data. In: ACM international conference on management of data - SIGMOD 2004. ACM, New York, pp 563–574

    Chapter  Google Scholar 

  4. ARM (2009) ARM security technology building a secure system using TrustZone technology

    Google Scholar 

  5. Bethencourt J, Sahai A, Waters B (2007) Ciphertext-policy attribute-based encryption. In: IEEE symposium on security and privacy - S&P 2007. IEEE Computer Society, Silver Spring, MD, pp 321–334

    Google Scholar 

  6. Blaze M, Bleumer G, Strauss M (1998) Divertible protocols and atomic proxy cryptography. In: Goos G, Hartmanis J, van Leeuwen J (eds) Advances in cryptology - EUROCRYPT 1998. Lecture notes in computer science, vol 1403. Springer, Berlin, pp 127–144

    Chapter  Google Scholar 

  7. Boldyreva A, Chenette N, Lee Y, O’Neill A (2009) Order-preserving symmetric encryption. In: Joux A (ed) Advances in cryptology - EUROCRYPT 2009. Lecture notes in computer science, vol 5479. Springer, Berlin, pp 224–241

    Chapter  Google Scholar 

  8. Boneh D, Franklin M (2001) Identity-based encryption from the weil pairing. In: Kilian J (ed) Advance in cryptology - CRYPTO 2001. Lecture notes in computer science, vol 2139. Springer, Berlin, pp 213–229

    Chapter  Google Scholar 

  9. Boneh D, Gentry C, Lynn B, Shacham H et al (2003) A survey of two signature aggregation techniques. RSA Cryptobytes 6(2):1–10

    Google Scholar 

  10. Boneh D, Crescenzo GD, Ostrovsky R, Persiano G (2004) Public key encryption with keyword search. In: Advances in cryptology - EUROCRYPT 2004. Lecture notes in computer science, vol 3027. Springer, Berlin, pp 506–522

    Google Scholar 

  11. Boritz JE (2005) IS practitioners’ views on core concepts of information integrity. Int J Account Inf Syst 6(4):260–279

    Article  Google Scholar 

  12. Brakerski Z, Vaikuntanathan V (2011) Efficient fully homomorphic encryption from (standard) LWE. In: Ostrovsky R (ed) IEEE 52nd annual symposium on foundations of computer science - FOCS 2011. IEEE Computer Society, Silver Spring, MD, pp 97–106

    Chapter  Google Scholar 

  13. Chang EC, Xu J (2008) Remote integrity check with dishonest storage server. In: Jajodia S, López J (eds) 13th european symposium on research in computer security – ESORICS 2008. Lecture notes in computer science, vol 5283. Springer, Berlin, pp 223–237

    Chapter  Google Scholar 

  14. Chow S, Eisen P, Johnson H, Van Oorschot PC (2003) A white-box des implementation for drm applications. In: Digital rights management. Springer, Berlin, pp 1–15

    Chapter  Google Scholar 

  15. Cohen JC, Acharya S (2014) Towards a trusted HDFS storage platform: mitigating threats to hadoop infrastructures using hardware-accelerated encryption with TPM-rooted key protection. J Inf Secur Appl 19(3):224–244

    Google Scholar 

  16. Curtmola R, Garay J, Kamara S, Ostrovsky R (2006) Searchable symmetric encryption: improved definitions and efficient constructions. In: Proceedings of the 13th ACM conference on computer and communications security - CCS 2006. ACM, New York, pp 79–88

    Google Scholar 

  17. Dalenius T (1986) Finding a needle in a haystack or identifying anonymous census records. J Off Stat 2(3):329

    Google Scholar 

  18. De Cristofaro E, Tsudik G (2010) Practical private set intersection protocols with linear complexity. In: Sion R (ed) 14th International conference financial cryptography and data security - FC 2010. Lecture notes in computer science, vol 6052. Springer, Berlin, pp 143–159

    Google Scholar 

  19. Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  20. Delplace V, Manneback P, Pinel F, Varette S, Bouvry P (2013) Comparing the performance and power usage of GPU and ARM clusters for MapReduce. In: Third international conference on cloud and green computing - CGC 2013. IEEE, New York, pp 199–200

    Chapter  Google Scholar 

  21. Dong C, Chen L, Wen Z (2013) When private set intersection meets big data: an efficient and scalable protocol. In: Proceedings of the 20th ACM conference on computer and communications security - CCS 2013. ACM, New York, pp 789–800

    Google Scholar 

  22. Dwork C (2006) Differential privacy. In: Automata, Languages and Programming - ICALP 2006. Lecture notes in computer science, vol 4052. Springer, Berlin, pp 1–12

    Google Scholar 

  23. Dworkin M (2011) NIST SP 800-38A recommendation for block cipher modes of operation: the CMAC mode for authentication

    Google Scholar 

  24. Fetscherin M (2002) Present state and emerging scenarios of digital rights management systems. Int J Media Manag 4(3):164–171

    Article  Google Scholar 

  25. FIPS 186-2 (2000) Digital signature standard (DSS)

    Google Scholar 

  26. FIPS 198-1 (2008) The keyed-hash message authentication code (HMAC)

    Google Scholar 

  27. Gentry C (2009) Fully homomorphic encryption using ideal lattices. In: Mitzenmacher M (ed) Proceedings of the 41st annual ACM symposium on theory of computing - STOC 2009. ACM, New York, pp 169–178

    Google Scholar 

  28. Gentry C, Sahai A, Waters B (2013) Homomorphic encryption from learning with errors: conceptually-simpler, asymptotically-faster, attribute-based. In: Canetti R, Garay JA (eds) Advances in cryptology – CRYPTO 2013. Lecture notes in computer science, vol 8043. Springer, Berlin, pp 75–92

    Chapter  Google Scholar 

  29. Goh EJ (2003) Secure indexes. IACR cryptology ePrint archive. http://eprint.iacr.org/2003/216.pdf

  30. Goldwasser S, Micali S, Rackoff C (1985) The knowledge complexity of interactive proof-systems. In: Proceedings of the 7th annual ACM symposium on theory of computing - STOC 1985. ACM, New York, pp 291–304

    Google Scholar 

  31. Goodacre J, Cambridge A (2013) The evolution of the ARM architecture towards big data and the data-centre. In: Proceedings of the 8th workshop on virtualization in high-performance cloud computing - VHPC 2013. ACM, New York, p 4

    Google Scholar 

  32. Goyal V, Pandey O, Sahai A, Waters B (2006) Attribute-based encryption for fine-grained access control of encrypted data. In: Juels A, Wright RN, di Vimercati SDC (eds) Proceedings of the 13th ACM conference on computer and communications security - CCS 2006. ACM, New York, pp 89–98

    Google Scholar 

  33. Hacigümüş H, Iyer BR, Li C, Mehrotra S (2002) Executing SQL over encrypted data in the database-service-provider model. In: Franklin MJ, Moon B, Ailamaki A (eds) Proceedings of the ACM international conference on management of data - SIGMOD 2002. ACM, New York, pp 216–227

    Chapter  Google Scholar 

  34. Herstein IN (1990) Abstract algebra. Macmillan, New York

    MATH  Google Scholar 

  35. Hoekstra M, Lal R, Pappachan P, Phegade V, Del Cuvillo J (2013) Using innovative instructions to create trustworthy software solutions. In: Proceedings of the 2nd international workshop on hardware and architectural support for security and privacy - HASP 2013. ACM, New York

    Google Scholar 

  36. Intel Software Guard Extensions Programming Reference (2014). https://software.intel.com/sites/default/files/managed/48/88/329298-002.pdf

  37. Johnson D, Menezes A, Vanstone S (2001) The elliptic curve digital signature algorithm (ECDSA). Int J Inf Secur 1:36–63

    Article  Google Scholar 

  38. Johnson R, Molnar D, Song D, Wagner D (2002) Homomorphic signature schemes. In: Preneel B (ed) Topics in cryptology – CT-RSA 2002. Lecture notes in computer science, vol 2271. Springer, Berlin, pp 244–262

    Chapter  Google Scholar 

  39. Juels A Jr, BSK (2007) Pors: proofs of retrievability for large files. In: Ning P, di Vimercati SDC, Syverson PF (eds) Proceedings of the 2007 ACM conference on computer and communications security - CCS 2007. ACM, New York, pp 584–597

    Google Scholar 

  40. Koomey JG, Belady C, Patterson M, Santos A, Lange KD (2009) Assessing trends over time in performance, costs, and energy use for servers. Lawrence Berkeley National Laboratory, Stanford University, Microsoft Corporation, and Intel Corporation, Technical Report

    Google Scholar 

  41. Ku W, Chi CH (2004) Survey on the technological aspects of digital rights management. In: Zhang K, Zheng Y (eds) 7th international conference on information security - ISC 2004. Lecture notes in computer science, vol 3225. Springer, Berlin, pp 391–403

    Google Scholar 

  42. Li F, Hadjieleftheriou M, Kollios G, Reyzin L (2006) Dynamic authenticated index structures for outsourced databases. In: Proceedings of the 2006 ACM international conference on management of data - SIGMOD 2006. ACM, New York, pp 121–132

    Google Scholar 

  43. Li N, Li T, Venkatasubramanian S (2007) t-closeness: Privacy beyond k-anonymity and ℓ-diversity. In: IEEE 23rd international conference on data engineering - ICDE 2007, pp 106–115. doi:10.1109/ICDE.2007.367856

    Google Scholar 

  44. Lindell Y, Pinkas B (2000) Privacy preserving data mining. In: Bellare M (ed) Advances in cryptology - CRYPTO 2000. Lecture notes in computer science, vol 1880. Springer, Berlin, pp 36–54

    Chapter  Google Scholar 

  45. Lindell Y, Pinkas B (2009) A proof of security of Yao’s protocol for two-party computation. J Cryptol 22(2):161–188

    Article  MathSciNet  MATH  Google Scholar 

  46. Lindell Y, Pinkas B (2009) Secure multiparty computation for privacy-preserving data mining. J Priv Confid 1(1):5

    Google Scholar 

  47. Luby M, Rackoff C (1988) How to construct pseudorandom permutations from pseudorandom functions. SIAM J Comput 17(2):373–386

    Article  MathSciNet  MATH  Google Scholar 

  48. Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) â„“-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov Data 1(1). doi:10.1145/1217299.1217302. http://doi.acm.org/10.1145/1217299.1217302

    Google Scholar 

  49. Mambo M, Okamoto E (1997) Proxy cryptosystems: delegation of the power to decrypt ciphertexts. IEICE Trans Fundam Electron Commun Comput Sci E80-A:54–63

    Google Scholar 

  50. McSherry FD (2009) Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Çetintemel U, Zdonik SB, Kossmann D, Tatbul N (eds) Proceedings of the ACM SIGMOD international conference on management of data - SIGMOD 2009. ACM, New York, pp 19–30

    Google Scholar 

  51. Mohan P, Thakurta A, Shi E, Song D, Culler D (2012) Gupt: privacy preserving data analysis made easy. In: Proceedings of the ACM SIGMOD international conference on management of data - SIGMOD 2012. ACM, New York, pp 349–360

    Chapter  Google Scholar 

  52. Ou Z, Pang B, Deng Y, Nurminen JK, Yla-Jaaski A, Hui P (2012) Energy-and cost-efficiency analysis of ARM-based clusters. In: 12th IEEE/ACM international symposium on cluster, cloud and grid computing - CCGrid 2012. IEEE, New York, pp 115–123

    Chapter  Google Scholar 

  53. Popa RA, Redfield C, Zeldovich N, Balakrishnan H (2011) CryptDB: protecting confidentiality with encrypted query processing. In: Proceedings of the 23rd ACM symposium on operating systems principles - SOSP 2011. ACM, New York, pp 85–100

    Google Scholar 

  54. Quisquater JJ, Quisquater M, Quisquater M, Quisquater M, Guillou L, Guillou MA, Guillou G, Guillou A, Guillou G, Guillou S (1990) How to explain zero-knowledge protocols to your children. In: Menezes A, Vanstone SA (eds) Advances in cryptology – CRYPTO89 Proceedings. Lecture notes in computer science, vol 537. Springer, Berlin, pp 628–631

    Chapter  Google Scholar 

  55. Rivest RL, Adleman L, Dertouzos ML (1978) On data banks and privacy homomorphisms. Found Secure Comput 4(11):169–180

    MathSciNet  Google Scholar 

  56. Roy I, Setty ST, Kilzer A, Shmatikov V, Witchel E (2010) Airavat: security and privacy for mapreduce. In: USENIX symposium on networked systems design & implementation - NSDI 2010, USENIX, vol 10, pp 297–312

    Google Scholar 

  57. Ruan A, Martin A (2012) TMR: towards a trusted MapReduce infrastructure. In: IEEE eighth world congress on services - SERVICES 2012. IEEE, New York, pp 141–148

    Chapter  Google Scholar 

  58. Sahai A, Waters B (2005) Fuzzy identity-based encryption. In: Cramer R (ed) Advances in cryptology - EUROCRYPT 2005. Lecture notes in computer science, vol 3494. Springer, Berlin, pp 457–473

    Chapter  Google Scholar 

  59. Sandhu RS, Coyne EJ, Feinstein HL, Youman CE (1996) Role-based access control models. Computer 29(2):38–47

    Article  Google Scholar 

  60. Schuster F, Costa M, Fournet C, Gkantsidis C, Peinado M, Mainar-Ruiz G, Russinovich M (2015) VC3: trustworthy data analytics in the cloud using SGX. In: 36th IEEE symposium on security and privacy - S&P 2015. IEEE, New York

    Google Scholar 

  61. Shan Y, Wang B, Yan J, Wang Y, Xu N, Yang H (2010) FPMR: MapReduce framework on FPGA. In: Cheung PYK, Wawrzynek J (eds) Proceedings of the 18th annual ACM/SIGDA international symposium on field programmable gate arrays - FPGA 2010. ACM, Monterey, CA, pp 93–102

    Google Scholar 

  62. Sheikh R, Mishra DK, Kumar B (2011) Secure multiparty computation: from millionaires problem to anonymizer. Inform Secur J A Glob Perspect 20(1):25–33

    Article  Google Scholar 

  63. Shi E, Perrig A, Van Doorn L (2005) Bind: a fine-grained attestation service for secure distributed systems. In: 26th IEEE symposium on security and privacy - S&P 2005. IEEE, New York, pp 154–168

    Chapter  Google Scholar 

  64. Slagell A, Bonilla R, Yurcik W (2006) A survey of PKI components and scalability issues. In: 25th IEEE international on performance, computing, and communications conference - IPCCC 2006. IEEE, New York, pp 475–484

    Chapter  Google Scholar 

  65. Sweeney L (2002) k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowledge Based Syst 10(05):557–570

    Article  MathSciNet  MATH  Google Scholar 

  66. TCG (2011) TPM Main Specification. http://www.trustedcomputinggroup.org/resources/tpm_main_specification

  67. Trusted Platform Module (TPM) Summary (2008). Technical Report, Trusted Computing Group

    Google Scholar 

  68. van Dijk M, Gentry C, Halevi S, Vaikuntanathan V (2010) Fully homomorphic encryption over the integers. In: Gilbert H (ed) Advances in cryptology - EUROCRYPT 2010. Lecture notes in computer science, vol 6110. Springer, Berlin, pp 24–43

    Chapter  Google Scholar 

  69. Wang H, Yin J, Perng CS, Yu PS (2008) Dual encryption for query integrity assurance. In: Proceedings of the 17th ACM conference on information and knowledge management. ACM, New York, pp 863–872

    Google Scholar 

  70. Wyseur B (2009) White-box cryptography. Katholieke Universiteit, Arenbergkasteel, B-3001 Heverlee, Belgium

    Google Scholar 

  71. Xiaoxiao L (2014) Alibaba has big hopes for new big data processing service. http://english.caixin.com/2014-07-17/100705224.html

  72. Xie M, Wang H, Yin J, Meng X (2007) Integrity auditing of outsourced data. In: Koch C, Gehrke J, Garofalakis MN, Srivastava D, Aberer K, Deshpande A, Florescu D, Chan CY, Ganti V, Kanne CC, Klas W, Neuhold EJ (eds) Proceedings of the 33rd international conference on Very large data bases - VLDB 2007, VLDB Endowment, pp 782–793

    Google Scholar 

  73. Xu L, Pham KD, Kim H, Shi W, Suh T (2014) End-to-end big data processing protection in cloud environment using black-box: an FPGA approach. Int J Cloud Comput

    Google Scholar 

  74. Xu L, Shi W, Suh T (2014) PFC: privacy preserving FPGA cloud - a case study of MapReduce. In: 7th IEEE international conference on cloud computing

    Google Scholar 

  75. Yao AC (1982) Protocols for secure computations. In: IEEE 23th annual symposium on foundations of computer science - FOCS 1982. IEEE, New York, pp 160–164

    Google Scholar 

  76. Yao ACC (1986) How to generate and exchange secrets. In: IEEE 27th annual symposium on foundations of computer science - FOCS 1986. IEEE, New York, pp 162–167

    Google Scholar 

  77. Yuan E, Tong J (2005) Attributed based access control (abac) for web services. In: Proceedings of 2005 IEEE International Conference on Web Services - ICWS 2005. IEEE, New York

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Xu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Xu, L., Shi, W. (2016). Security Theories and Practices for Big Data. In: Yu, S., Guo, S. (eds) Big Data Concepts, Theories, and Applications . Springer, Cham. https://doi.org/10.1007/978-3-319-27763-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27763-9_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27761-5

  • Online ISBN: 978-3-319-27763-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics