Abstract
In the age of big data, data is increasing in a tremendous way in many fields and the data shared by the users is in a great risk. To preserve privacy of an individual anonymization-based algorithm like k-anonymity-related algorithm and differential privacy is proposed to make sure that the resulting dataset is free from privacy disclosure. However, majority of these anonymization algorithms are applied in isolated environment, without considering the utility in knowledge task making the dataset less informative. Also the presence of redundant data also decreases the performance and reduces accuracy of anonymization. Hence a preprocessing-based anonymization is required to increase the utility and to achieve accuracy in anonymization. This paper aims to apply the feature selection fast correlation-based filter (FCBF) solution to select the relevant features and remove the redundant data. Then k-anonymity is applied to dataset to achieve data anonymization. Comparisons on real-world dataset were made with anonymized dataset with preprocessing and without preprocessing and result was produced.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kambatla, K., et al.: Trends in Big Data Analytics. Elsevier (2014)
Chen, M., Lin, M.: Big Data: A Survey, vol. 19, pp. 171–209. Springer (2014)
Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl.-Based Syst. 10(5), 557–570 (2002)
Chen, P., et al.: Data-Intensive Applications, Challenges, Techniques and Technologies: A Survey on Big Data. Information Sciences (2014)
Zhang, B., et al.: Feature selection for classification under anonymity constraint. Trans. Data Priv. 10, 1–25 (2017)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)
Hall, M.A.: Correlation-Based Feature Selection for Machine Learning. Waikato University, Department of Computer Science (1999)
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)
Garcia, S., et al.: Big data preprocessing: methods and prospects. Big Data Analytics (2019)
Li, J., et al.: Feature selection: a data perspective. ACM Comput. Surv. 50(6), Article 94, Publication date (2017)
Wald, R., et al.: Comparison of stability for different families of filter-based and wrapper-based feature selection. In: 12th International Conference on Machine and Application (2013)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Saeys, Y., Inza, I., Larra˜naga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
De, S., et al.: Bayes Wipe: a scalable probabilistics framework for cleaning big data. J. Data Inf. Q. ACM (2016)
Peralta, D., et al.: Evolutionary Feature Selection for Big Data Classification: A Map Reduce Approach. Hindawi (2015)
Zhou, B., Pei, J.: The k-Anonymity and l-Diversity Approaches for Privacy Preservation in Social Networks Against Neighborhood Attacks. Springer (2010)
Yu, L., et al.: Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceeding of the Twentieth International Conference on Machine Learning (2003)
Zhao, Z., et al.: Graph regularized feature selection with data reconstruction. IEEE Trans. Knowl. Data Eng., 28(3) (2016)
Raul-Jose, et al.: Distributed Correlation-Based Feature Selection in Spark. Information Science (2018)
Compliance with Ethical Standards
All author states that there is no conflict of interest. We used our own data. Humans and animals are not involved in this research work.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Selvi, U., Pushpa, S. (2020). Big Data Feature Selection to Achieve Anonymization. In: Bindhu, V., Chen, J., Tavares, J. (eds) International Conference on Communication, Computing and Electronics Systems. Lecture Notes in Electrical Engineering, vol 637. Springer, Singapore. https://doi.org/10.1007/978-981-15-2612-1_6
Download citation
DOI: https://doi.org/10.1007/978-981-15-2612-1_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-2611-4
Online ISBN: 978-981-15-2612-1
eBook Packages: EngineeringEngineering (R0)