Overview
Processing big data to drive useful information has been in spotlight in recent years. Numerous approaches have been proposed to explore different ways to analyse the big data. However, data privacy has been an issue during the process because data could have been from various sources and they may contain sensitive personal information of individual. Hadoop MapReduce has been considered as one of the most promising approaches for big data processing. This chapter provides an overview of MapReduce environment, privacy challenges faced during the processing of data in MapReduce cluster, existing approaches adopted by various researchers to mitigate these issues. We also provide future guidelines for anonymized data processing to ensure individual privacy in MapReduce.
Introduction
Big data analytics is an emerging technology for finding new insights from large amounts of data. Processing and analyzing these large amounts of data require an extra set of tools and services....
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adnan M, Afzal M, Aslam M, Jan R, Martinez-Enriquez A (2014) Minimizing big data problems using cloud computing based on hadoop architecture. In: 2014 11th annual high-capacity optical networks and emerging/enabling technologies (HONET). IEEE, pp 99–103
Aggarwal CC, Philip SY (2008) A general survey of privacy-preserving data mining models and algorithms. In: Privacy-preserving data mining. Springer, Dordrecht, pp 11–52
Bayardo RJ, Agrawal R (2005) Data privacy through optimal k-anonymization. In: Proceedings of the 21st international conference on data engineering (ICDE 2005). IEEE, pp 217–228
Bazai SU, Jang-Jaccard J, Wang R (2017a, in press) Anonymizing k-nn classification on mapreduce. In: The 9th EAI international conference on mobile networks and management. Springer
Bazai SU, Jang-Jaccard J, Zhang X (2017b) A privacy preserving platform for mapreduce. In: International conference on applications and techniques in information security. Springer, pp 88–99
Bello-Orgaz G, Jung JJ, Camacho D (2016) Social big data: recent achievements and new challenges. Inf Fusion 28:45–59
Blass EO, Di Pietro R, Molva R, Önen M (2012) Prism-privacy-preserving search in mapreduce. In: Privacy enhancing technologies, vol 7384. Springer, pp 180–200
Blum A, Ligett K, Roth A (2013) A learning theory approach to noninteractive database privacy. J ACM (JACM) 60(2):12
Clifton C, Tassa T (2013) On syntactic anonymity and differential privacy. In: 2013 IEEE 29th international conference on data engineering workshops (ICDEW). IEEE, pp 88–93
Cramer R, Damgård I, Nielsen J (2001) Multiparty computation from threshold homomorphic encryption. In: Advances in cryptology-EUROCRYPT 2001, pp 280–300
Dankar FK, El Emam K (2012) The application of differential privacy to health data. In: Proceedings of the 2012 joint EDBT/ICDT workshops. ACM, pp 158–166
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Derbeko P, Dolev S, Gudes E, Sharma S (2016) Security and privacy aspects in mapreduce on clouds: a survey. Comput Sci Rev 20:1–28
Dwork C (2008) Differential privacy: a survey of results. In: International conference on theory and applications of models of computation. Springer, pp 1–19
Fletcher S, Islam MZ (2017) Differentially private random decision forests using smooth sensitivity. Exp Syst Appl 78:16–31
Goldreich O (1998) Secure multi-party computation. Manuscript preliminary version, pp 86–97
Goldreich O, Micali S, Wigderson A (1987) How to play any mental game. In: Proceedings of the nineteenth annual ACM symposium on theory of computing. ACM, pp 218–229
Inan A, Kantarcioglu M, Ghinita G, Bertino E (2010) Private record matching using differential privacy. In: Proceedings of the 13th international conference on extending database technology. ACM, pp 123–134
Jain P, Gyanchandani M, Khare N (2016) Big data privacy: a technological perspective and review. J Big Data 3(1):25
Ko SY, Jeon K, Morales R (2011) The hybrex model for confidentiality and privacy in cloud computing. In: HotCloud, pp 1–8
Mayberry T, Blass EO, Chan AH (2013) PIRMAP: efficient private information retrieval for mapreduce. In: International conference on financial cryptography and data security. Springer, pp 371–385
Micciancio D (2010) A first glimpse of cryptography’s holy grail. In: Commun ACM 53(3):96–96
Mohan P, Thakurta A, Shi E, Song D, Culler D (2012) GUPT: privacy preserving data analysis made easy. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data. ACM, pp 349–360
Natwichai J, Li X, Orlowska ME (2006) A reconstruction-based algorithm for classification rules hiding. In: Proceedings of the 17th Australasian database conference, vol 49. Australian Computer Society, Inc., pp 49–58
Patel AB, Birla M, Nair U (2012) Addressing big data problem using hadoop and map reduce. In: 2012 Nirma University international conference on engineering (NUiCONE). IEEE, pp 1–5
Peralta D, del RÃo S, RamÃrez-Gallego S, Triguero I, Benitez JM, Herrera F (2015) Evolutionary feature selection for big data classification: a mapreduce approach. In: Math Probl Eng, pp 1–12
Roy I, Setty ST, Kilzer A, Shmatikov V, Witchel E (2010) Airavat: security and privacy for mapreduce. In: NSDI, vol 10, pp 297–312
Sweeney L (2002) Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertain Fuzziness Knowl Based Syst 10(05):571–588
Tran Q, Sato H (2012) A solution for privacy protection in mapreduce. In: 2012 IEEE 36th annual computer software and applications conference (COMPSAC). IEEE, pp 515–520
Vernica R, Carey MJ, Li C (2010) Efficient parallel set-similarity joins using mapreduce. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. ACM, pp 495–506
Victor N, Lopez D, Abawajy JH (2016) Privacy models for big data: a survey. Int J Big Data Intell 3(1):61–75
White T (2012) Hadoop: the definitive guide. O’Reilly Media, Inc., Sebastopol
Xiao Z, Xiao Y (2014) Achieving accountable mapreduce in cloud computing. Futur Gener Comput Syst 30:1–13
Yao AC (1982) Protocols for secure computations. In: 23rd annual symposium on foundations of computer science, SFCS’08. IEEE, pp 160–164
Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. HotCloud 10(10–10):95
Zhang K, Zhou X, Chen Y, Wang X, Ruan Y (2011) Sedic: privacy-aware data intensive computing on hybrid clouds. In: Proceedings of the 18th ACM conference on computer and communications security. ACM, pp 515–526
Zhang X, Liu C, Nepal S, Dou W, Chen J (2012) Privacy-preserving layer over mapreduce on cloud. In: 2012 second international conference on cloud and green computing (CGC). IEEE, pp 304–310
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this entry
Cite this entry
Bazai, S.U., Jang-Jaccard, J., Zhang, X. (2019). Scalable Big Data Privacy with MapReduce. In: Sakr, S., Zomaya, A.Y. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-77525-8_243
Download citation
DOI: https://doi.org/10.1007/978-3-319-77525-8_243
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77524-1
Online ISBN: 978-3-319-77525-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering