Big Data Feature Selection to Achieve Anonymization

Selvi, U.; Pushpa, S.

doi:10.1007/978-981-15-2612-1_6

U. Selvi³⁷ &
S. Pushpa³⁷

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 637))

1056 Accesses
1 Citations

Abstract

In the age of big data, data is increasing in a tremendous way in many fields and the data shared by the users is in a great risk. To preserve privacy of an individual anonymization-based algorithm like k-anonymity-related algorithm and differential privacy is proposed to make sure that the resulting dataset is free from privacy disclosure. However, majority of these anonymization algorithms are applied in isolated environment, without considering the utility in knowledge task making the dataset less informative. Also the presence of redundant data also decreases the performance and reduces accuracy of anonymization. Hence a preprocessing-based anonymization is required to increase the utility and to achieve accuracy in anonymization. This paper aims to apply the feature selection fast correlation-based filter (FCBF) solution to select the relevant features and remove the redundant data. Then k-anonymity is applied to dataset to achieve data anonymization. Comparisons on real-world dataset were made with anonymized dataset with preprocessing and without preprocessing and result was produced.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kambatla, K., et al.: Trends in Big Data Analytics. Elsevier (2014)
Google Scholar
Chen, M., Lin, M.: Big Data: A Survey, vol. 19, pp. 171–209. Springer (2014)
Google Scholar
Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl.-Based Syst. 10(5), 557–570 (2002)
Article MathSciNet Google Scholar
Chen, P., et al.: Data-Intensive Applications, Challenges, Techniques and Technologies: A Survey on Big Data. Information Sciences (2014)
Google Scholar
Zhang, B., et al.: Feature selection for classification under anonymity constraint. Trans. Data Priv. 10, 1–25 (2017)
Google Scholar
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)
Article Google Scholar
Hall, M.A.: Correlation-Based Feature Selection for Machine Learning. Waikato University, Department of Computer Science (1999)
Google Scholar
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)
Article Google Scholar
Garcia, S., et al.: Big data preprocessing: methods and prospects. Big Data Analytics (2019)
Google Scholar
Li, J., et al.: Feature selection: a data perspective. ACM Comput. Surv. 50(6), Article 94, Publication date (2017)
Article Google Scholar
Wald, R., et al.: Comparison of stability for different families of filter-based and wrapper-based feature selection. In: 12th International Conference on Machine and Application (2013)
Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Saeys, Y., Inza, I., Larra˜naga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Article Google Scholar
De, S., et al.: Bayes Wipe: a scalable probabilistics framework for cleaning big data. J. Data Inf. Q. ACM (2016)
Google Scholar
Peralta, D., et al.: Evolutionary Feature Selection for Big Data Classification: A Map Reduce Approach. Hindawi (2015)
Google Scholar
Zhou, B., Pei, J.: The k-Anonymity and l-Diversity Approaches for Privacy Preservation in Social Networks Against Neighborhood Attacks. Springer (2010)
Google Scholar
Yu, L., et al.: Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceeding of the Twentieth International Conference on Machine Learning (2003)
Google Scholar
Zhao, Z., et al.: Graph regularized feature selection with data reconstruction. IEEE Trans. Knowl. Data Eng., 28(3) (2016)
Article Google Scholar
Raul-Jose, et al.: Distributed Correlation-Based Feature Selection in Spark. Information Science (2018)
Google Scholar

Download references

Compliance with Ethical Standards

All author states that there is no conflict of interest. We used our own data. Humans and animals are not involved in this research work.

Author information

Authors and Affiliations

St. Peter’s Institute of Higher Education and Research, Chennai, Tamil Nadu, India
U. Selvi & S. Pushpa

Authors

U. Selvi
View author publications
You can also search for this author in PubMed Google Scholar
S. Pushpa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of ECE, PPG Institute of Technology, Coimbatore, Tamil Nadu, India
V. Bindhu
Department of Electrical Engineering, Dayeh University, Changhua City, Taiwan, Taiwan
Joy Chen
Faculty of Engineering, University of Porto, Porto, Portugal
João Manuel R. S. Tavares

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Selvi, U., Pushpa, S. (2020). Big Data Feature Selection to Achieve Anonymization. In: Bindhu, V., Chen, J., Tavares, J. (eds) International Conference on Communication, Computing and Electronics Systems. Lecture Notes in Electrical Engineering, vol 637. Springer, Singapore. https://doi.org/10.1007/978-981-15-2612-1_6

Download citation

DOI: https://doi.org/10.1007/978-981-15-2612-1_6
Published: 05 March 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-2611-4
Online ISBN: 978-981-15-2612-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics