Skip to main content

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 637))

Abstract

In the age of big data, data is increasing in a tremendous way in many fields and the data shared by the users is in a great risk. To preserve privacy of an individual anonymization-based algorithm like k-anonymity-related algorithm and differential privacy is proposed to make sure that the resulting dataset is free from privacy disclosure. However, majority of these anonymization algorithms are applied in isolated environment, without considering the utility in knowledge task making the dataset less informative. Also the presence of redundant data also decreases the performance and reduces accuracy of anonymization. Hence a preprocessing-based anonymization is required to increase the utility and to achieve accuracy in anonymization. This paper aims to apply the feature selection fast correlation-based filter (FCBF) solution to select the relevant features and remove the redundant data. Then k-anonymity is applied to dataset to achieve data anonymization. Comparisons on real-world dataset were made with anonymized dataset with preprocessing and without preprocessing and result was produced.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kambatla, K., et al.: Trends in Big Data Analytics. Elsevier (2014)

    Google Scholar 

  2. Chen, M., Lin, M.: Big Data: A Survey, vol. 19, pp. 171–209. Springer (2014)

    Google Scholar 

  3. Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl.-Based Syst. 10(5), 557–570 (2002)

    Article  MathSciNet  Google Scholar 

  4. Chen, P., et al.: Data-Intensive Applications, Challenges, Techniques and Technologies: A Survey on Big Data. Information Sciences (2014)

    Google Scholar 

  5. Zhang, B., et al.: Feature selection for classification under anonymity constraint. Trans. Data Priv. 10, 1–25 (2017)

    Google Scholar 

  6. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)

    Article  Google Scholar 

  7. Hall, M.A.: Correlation-Based Feature Selection for Machine Learning. Waikato University, Department of Computer Science (1999)

    Google Scholar 

  8. Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)

    Article  Google Scholar 

  9. Garcia, S., et al.: Big data preprocessing: methods and prospects. Big Data Analytics (2019)

    Google Scholar 

  10. Li, J., et al.: Feature selection: a data perspective. ACM Comput. Surv. 50(6), Article 94, Publication date (2017)

    Article  Google Scholar 

  11. Wald, R., et al.: Comparison of stability for different families of filter-based and wrapper-based feature selection. In: 12th International Conference on Machine and Application (2013)

    Google Scholar 

  12. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  13. Saeys, Y., Inza, I., Larra˜naga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)

    Article  Google Scholar 

  14. De, S., et al.: Bayes Wipe: a scalable probabilistics framework for cleaning big data. J. Data Inf. Q. ACM (2016)

    Google Scholar 

  15. Peralta, D., et al.: Evolutionary Feature Selection for Big Data Classification: A Map Reduce Approach. Hindawi (2015)

    Google Scholar 

  16. Zhou, B., Pei, J.: The k-Anonymity and l-Diversity Approaches for Privacy Preservation in Social Networks Against Neighborhood Attacks. Springer (2010)

    Google Scholar 

  17. Yu, L., et al.: Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceeding of the Twentieth International Conference on Machine Learning (2003)

    Google Scholar 

  18. Zhao, Z., et al.: Graph regularized feature selection with data reconstruction. IEEE Trans. Knowl. Data Eng., 28(3) (2016)

    Article  Google Scholar 

  19. Raul-Jose, et al.: Distributed Correlation-Based Feature Selection in Spark. Information Science (2018)

    Google Scholar 

Download references

Compliance with Ethical Standards

All author states that there is no conflict of interest. We used our own data. Humans and animals are not involved in this research work.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Selvi, U., Pushpa, S. (2020). Big Data Feature Selection to Achieve Anonymization. In: Bindhu, V., Chen, J., Tavares, J. (eds) International Conference on Communication, Computing and Electronics Systems. Lecture Notes in Electrical Engineering, vol 637. Springer, Singapore. https://doi.org/10.1007/978-981-15-2612-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-2612-1_6

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-2611-4

  • Online ISBN: 978-981-15-2612-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics