Skip to main content

Impact of Outliers on Anonymized Categorical Data

  • Conference paper
Advances in Digital Image Processing and Information Technology (DPPR 2011)

Abstract

Preserving privacy is indispensable when publishing microdata with sensitive information. Anonymization principles like k-anonymity, l-diversity were developed to protect the sensitive information. An adversary with sufficient background knowledge inferring the individual’s sensitive information signifies disclosure of the microdata. None of the above mentioned principles addressed the presence of outliers. Outliers can be classified into two types viz., local and global. This paper proposes a practically feasible distance based algorithm to anonymize the local outliers. Our proposed algorithm is capable of handling both numerical and categorical data. The experimental results of our proposed approach focused to categorical data presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. International Journal on Uncertainty, Fuzziness and Knowledge-Based Systems 10(5), 571–588 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  2. Wang, H., Liu, R.: Hiding Distinguished Ones into Crowd: Privacy-Preserving Publishing Data with Outliers. In: The 12th International Conference on Extending Database Technology (EDBT), Saint-Petersburg, Russia, March 23-26 (2009)

    Google Scholar 

  3. Sweeney, L.: k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-Based Systems 10(5), 557–570 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  4. Estivill-Castro, V., Brankovic, L.: Data swapping: Balancing privacy against precision in mining for logic rules. In: Mohania, M., Tjoa, A.M. (eds.) DaWaK 1999. LNCS, vol. 1676, pp. 389–398. Springer, Heidelberg (1999)

    Google Scholar 

  5. Machanavajjhala, A., Gehrke, J., Kifer, D.: l-diversity: Privacy beyond k-anonymity. In: ICDE, p. 24 (2006)

    Google Scholar 

  6. Agrawal, R., Srikant, R.: Privacy preserving data mining. In: Proc. of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, Texas, pp. 439–450 (May 2000)

    Google Scholar 

  7. Wong, R.C.W., Fu, A.W.C., Wang, K., Pei, J.: Minimality attack in privacy preserving data publishing. In: VLDB, pp. 543–554 (2007)

    Google Scholar 

  8. Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases. University of California, Irvine (1998), http://www.ics.uci.edu/-mlearn/MLRepository.html

  9. Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.: Utility-based anonymization using local recoding. In: SIGKDD (2006)

    Google Scholar 

  10. Valli Kumari, V., Srinivasa Rao, S., Raju, K.V.S.V.N., Ramana, K.V., Avadhani, B.V.S.: Fuzzy based approach for privacy preserving publication of data. IJCSNS International Journal of Computer Science and Network Security 8(1), 115–121 (2008)

    Google Scholar 

  11. Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-based outliers: algorithms and applications. VLDB Journal 8(3-4), 237–253 (2000)

    Article  Google Scholar 

  12. Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: SIGMOD (2000)

    Google Scholar 

  13. Breunig, M.M., Kriegel, H., Ng, R.T., Sander, J.: LOF: Identifying Density-Based Local Outliers. In: SIGMOD (2000)

    Google Scholar 

  14. Barnett, V., Lewis, T.: Outliers in Statistic Data. John Wiley’s Publisher, NY (1994)

    MATH  Google Scholar 

  15. Arning, A., Agrawal, R., Raghavan, P.: A Linear Method for Deviation Detection in Large Databases. In: 2nd International Conference on Knowledge Discovery and Data Mining Proceedings, pp. 164–169 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Venkata Ramana, K., Valli Kumari, V., Raju, K.V.S.V.N. (2011). Impact of Outliers on Anonymized Categorical Data. In: Nagamalai, D., Renault, E., Dhanuskodi, M. (eds) Advances in Digital Image Processing and Information Technology. DPPR 2011. Communications in Computer and Information Science, vol 205. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24055-3_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24055-3_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24054-6

  • Online ISBN: 978-3-642-24055-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics