Abstract
Privacy protection for resource description framework data is very important because RDF (i.e., linked data) is widely used in published data format in many areas, including government open data, health-care for individuals, and social relationships. As data can include private information belonging to individuals or companies and can make private information available to third parties, there are several anonymization models provided for preserving privacy in practice. k-anonymity has thus gained attention in research. Recently, several RDF anonymization models have been proposed. However, current approaches focus on a model and a metric for measuring information loss but do not consider large-scale RDF data. In this paper, we propose an efficient anonymizing method for large-scale RDF data. We develop a greedy partitioning algorithm (i.e., SPARK) for RDF anonymization. SPARK is a leading platform for big data processing. The results of experiments on synthetic datasets demonstrate that our proposed method requires less running time than previous methods.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl.-Based Syst. 10(05), 557–570 (2002)
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-diversity: privacy beyond k-anonymity. In: ICDE 2006, p. 24. IEEE (2006)
Radulovic, F., Garcia Castro, R., Gomez-Perez, A.: Towards the anonymization of RDF data (2015)
Heitmann, B., Hermsen, F., Decker, S.: k-RDF-neighbourhood anonymity: combining structural and attribute-based anonymization for linked data. In: PrivOn@ ISWC (2017)
Li, N., Li, T., Venkatasubramanian, S.: t-closeness: privacy beyond k-anonymity and l-diversity. In: IEEE 23rd International Conference on Data Engineering, ICDE 2007, pp. 106–115. IEEE (2007)
Zaharia, M., Xin, R.S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M.J., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: Proceedings of the 22nd International Conference on Data Engineering, ICDE 2006, p. 25. IEEE (2006)
Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: 21st International Conference on Data Engineering, ICDE 2005, Proceedings, pp. 217–228. IEEE (2005)
Acknowledgement
This work was supported by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2018-01417) supervised by the IITP (Institute for Information & Communications Technology Promotion) and IITP grant funded by the Korea government (MSIP) (No. R0113-15-0005, Development of a Unified Data Engineering Technology for Largescale Transaction Processing and Real-Time Complex Analytics) and Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. NRF-2018R1D1A1B07048380).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Temuujin, O., Jeon, M., Seo, K., Ahn, J., Im, DH. (2020). SPARK-Based Partitioning Algorithm for k-Anonymization of Large RDFs. In: Park, J., Yang, L., Jeong, YS., Hao, F. (eds) Advanced Multimedia and Ubiquitous Engineering. MUE FutureTech 2019 2019. Lecture Notes in Electrical Engineering, vol 590. Springer, Singapore. https://doi.org/10.1007/978-981-32-9244-4_41
Download citation
DOI: https://doi.org/10.1007/978-981-32-9244-4_41
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-32-9243-7
Online ISBN: 978-981-32-9244-4
eBook Packages: EngineeringEngineering (R0)