Skip to main content

Scalable Big Data Privacy with MapReduce

  • Reference work entry
  • First Online:
Encyclopedia of Big Data Technologies

Overview

Processing big data to drive useful information has been in spotlight in recent years. Numerous approaches have been proposed to explore different ways to analyse the big data. However, data privacy has been an issue during the process because data could have been from various sources and they may contain sensitive personal information of individual. Hadoop MapReduce has been considered as one of the most promising approaches for big data processing. This chapter provides an overview of MapReduce environment, privacy challenges faced during the processing of data in MapReduce cluster, existing approaches adopted by various researchers to mitigate these issues. We also provide future guidelines for anonymized data processing to ensure individual privacy in MapReduce.

Introduction

Big data analytics is an emerging technology for finding new insights from large amounts of data. Processing and analyzing these large amounts of data require an extra set of tools and services....

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 849.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 999.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Adnan M, Afzal M, Aslam M, Jan R, Martinez-Enriquez A (2014) Minimizing big data problems using cloud computing based on hadoop architecture. In: 2014 11th annual high-capacity optical networks and emerging/enabling technologies (HONET). IEEE, pp 99–103

    Google Scholar 

  • Aggarwal CC, Philip SY (2008) A general survey of privacy-preserving data mining models and algorithms. In: Privacy-preserving data mining. Springer, Dordrecht, pp 11–52

    Chapter  Google Scholar 

  • Bayardo RJ, Agrawal R (2005) Data privacy through optimal k-anonymization. In: Proceedings of the 21st international conference on data engineering (ICDE 2005). IEEE, pp 217–228

    Google Scholar 

  • Bazai SU, Jang-Jaccard J, Wang R (2017a, in press) Anonymizing k-nn classification on mapreduce. In: The 9th EAI international conference on mobile networks and management. Springer

    Google Scholar 

  • Bazai SU, Jang-Jaccard J, Zhang X (2017b) A privacy preserving platform for mapreduce. In: International conference on applications and techniques in information security. Springer, pp 88–99

    Google Scholar 

  • Bello-Orgaz G, Jung JJ, Camacho D (2016) Social big data: recent achievements and new challenges. Inf Fusion 28:45–59

    Article  Google Scholar 

  • Blass EO, Di Pietro R, Molva R, Önen M (2012) Prism-privacy-preserving search in mapreduce. In: Privacy enhancing technologies, vol 7384. Springer, pp 180–200

    Google Scholar 

  • Blum A, Ligett K, Roth A (2013) A learning theory approach to noninteractive database privacy. J ACM (JACM) 60(2):12

    Article  MathSciNet  MATH  Google Scholar 

  • Clifton C, Tassa T (2013) On syntactic anonymity and differential privacy. In: 2013 IEEE 29th international conference on data engineering workshops (ICDEW). IEEE, pp 88–93

    Google Scholar 

  • Cramer R, DamgÃ¥rd I, Nielsen J (2001) Multiparty computation from threshold homomorphic encryption. In: Advances in cryptology-EUROCRYPT 2001, pp 280–300

    MathSciNet  MATH  Google Scholar 

  • Dankar FK, El Emam K (2012) The application of differential privacy to health data. In: Proceedings of the 2012 joint EDBT/ICDT workshops. ACM, pp 158–166

    Google Scholar 

  • Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  • Derbeko P, Dolev S, Gudes E, Sharma S (2016) Security and privacy aspects in mapreduce on clouds: a survey. Comput Sci Rev 20:1–28

    Article  MathSciNet  MATH  Google Scholar 

  • Dwork C (2008) Differential privacy: a survey of results. In: International conference on theory and applications of models of computation. Springer, pp 1–19

    MATH  Google Scholar 

  • Fletcher S, Islam MZ (2017) Differentially private random decision forests using smooth sensitivity. Exp Syst Appl 78:16–31

    Article  Google Scholar 

  • Goldreich O (1998) Secure multi-party computation. Manuscript preliminary version, pp 86–97

    Google Scholar 

  • Goldreich O, Micali S, Wigderson A (1987) How to play any mental game. In: Proceedings of the nineteenth annual ACM symposium on theory of computing. ACM, pp 218–229

    Google Scholar 

  • Inan A, Kantarcioglu M, Ghinita G, Bertino E (2010) Private record matching using differential privacy. In: Proceedings of the 13th international conference on extending database technology. ACM, pp 123–134

    Google Scholar 

  • Jain P, Gyanchandani M, Khare N (2016) Big data privacy: a technological perspective and review. J Big Data 3(1):25

    Article  Google Scholar 

  • Ko SY, Jeon K, Morales R (2011) The hybrex model for confidentiality and privacy in cloud computing. In: HotCloud, pp 1–8

    Google Scholar 

  • Mayberry T, Blass EO, Chan AH (2013) PIRMAP: efficient private information retrieval for mapreduce. In: International conference on financial cryptography and data security. Springer, pp 371–385

    Google Scholar 

  • Micciancio D (2010) A first glimpse of cryptography’s holy grail. In: Commun ACM 53(3):96–96

    Article  Google Scholar 

  • Mohan P, Thakurta A, Shi E, Song D, Culler D (2012) GUPT: privacy preserving data analysis made easy. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data. ACM, pp 349–360

    Google Scholar 

  • Natwichai J, Li X, Orlowska ME (2006) A reconstruction-based algorithm for classification rules hiding. In: Proceedings of the 17th Australasian database conference, vol 49. Australian Computer Society, Inc., pp 49–58

    Google Scholar 

  • Patel AB, Birla M, Nair U (2012) Addressing big data problem using hadoop and map reduce. In: 2012 Nirma University international conference on engineering (NUiCONE). IEEE, pp 1–5

    Google Scholar 

  • Peralta D, del Río S, Ramírez-Gallego S, Triguero I, Benitez JM, Herrera F (2015) Evolutionary feature selection for big data classification: a mapreduce approach. In: Math Probl Eng, pp 1–12

    MATH  Google Scholar 

  • Roy I, Setty ST, Kilzer A, Shmatikov V, Witchel E (2010) Airavat: security and privacy for mapreduce. In: NSDI, vol 10, pp 297–312

    Google Scholar 

  • Sweeney L (2002) Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertain Fuzziness Knowl Based Syst 10(05):571–588

    Article  MathSciNet  MATH  Google Scholar 

  • Tran Q, Sato H (2012) A solution for privacy protection in mapreduce. In: 2012 IEEE 36th annual computer software and applications conference (COMPSAC). IEEE, pp 515–520

    Google Scholar 

  • Vernica R, Carey MJ, Li C (2010) Efficient parallel set-similarity joins using mapreduce. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. ACM, pp 495–506

    Google Scholar 

  • Victor N, Lopez D, Abawajy JH (2016) Privacy models for big data: a survey. Int J Big Data Intell 3(1):61–75

    Article  Google Scholar 

  • White T (2012) Hadoop: the definitive guide. O’Reilly Media, Inc., Sebastopol

    Google Scholar 

  • Xiao Z, Xiao Y (2014) Achieving accountable mapreduce in cloud computing. Futur Gener Comput Syst 30:1–13

    Article  Google Scholar 

  • Yao AC (1982) Protocols for secure computations. In: 23rd annual symposium on foundations of computer science, SFCS’08. IEEE, pp 160–164

    Google Scholar 

  • Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. HotCloud 10(10–10):95

    Google Scholar 

  • Zhang K, Zhou X, Chen Y, Wang X, Ruan Y (2011) Sedic: privacy-aware data intensive computing on hybrid clouds. In: Proceedings of the 18th ACM conference on computer and communications security. ACM, pp 515–526

    Google Scholar 

  • Zhang X, Liu C, Nepal S, Dou W, Chen J (2012) Privacy-preserving layer over mapreduce on cloud. In: 2012 second international conference on cloud and green computing (CGC). IEEE, pp 304–310

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuyun Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Bazai, S.U., Jang-Jaccard, J., Zhang, X. (2019). Scalable Big Data Privacy with MapReduce. In: Sakr, S., Zomaya, A.Y. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-77525-8_243

Download citation

Publish with us

Policies and ethics