Impact of Outliers on Anonymized Categorical Data

Venkata Ramana, K.; Valli Kumari, V.; Raju, K. V. S. V. N.

doi:10.1007/978-3-642-24055-3_33

K. Venkata Ramana⁴,
V. Valli Kumari⁴ &
K. V. S. V. N. Raju⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 205))

Included in the following conference series:

International Conference on Digital Image Processing and Information Technology

1385 Accesses
2 Citations

Abstract

Preserving privacy is indispensable when publishing microdata with sensitive information. Anonymization principles like k-anonymity, l-diversity were developed to protect the sensitive information. An adversary with sufficient background knowledge inferring the individual’s sensitive information signifies disclosure of the microdata. None of the above mentioned principles addressed the presence of outliers. Outliers can be classified into two types viz., local and global. This paper proposes a practically feasible distance based algorithm to anonymize the local outliers. Our proposed algorithm is capable of handling both numerical and categorical data. The experimental results of our proposed approach focused to categorical data presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. International Journal on Uncertainty, Fuzziness and Knowledge-Based Systems 10(5), 571–588 (2002)
Article MathSciNet MATH Google Scholar
Wang, H., Liu, R.: Hiding Distinguished Ones into Crowd: Privacy-Preserving Publishing Data with Outliers. In: The 12th International Conference on Extending Database Technology (EDBT), Saint-Petersburg, Russia, March 23-26 (2009)
Google Scholar
Sweeney, L.: k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-Based Systems 10(5), 557–570 (2002)
Article MathSciNet MATH Google Scholar
Estivill-Castro, V., Brankovic, L.: Data swapping: Balancing privacy against precision in mining for logic rules. In: Mohania, M., Tjoa, A.M. (eds.) DaWaK 1999. LNCS, vol. 1676, pp. 389–398. Springer, Heidelberg (1999)
Google Scholar
Machanavajjhala, A., Gehrke, J., Kifer, D.: l-diversity: Privacy beyond k-anonymity. In: ICDE, p. 24 (2006)
Google Scholar
Agrawal, R., Srikant, R.: Privacy preserving data mining. In: Proc. of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, Texas, pp. 439–450 (May 2000)
Google Scholar
Wong, R.C.W., Fu, A.W.C., Wang, K., Pei, J.: Minimality attack in privacy preserving data publishing. In: VLDB, pp. 543–554 (2007)
Google Scholar
Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases. University of California, Irvine (1998), http://www.ics.uci.edu/-mlearn/MLRepository.html
Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.: Utility-based anonymization using local recoding. In: SIGKDD (2006)
Google Scholar
Valli Kumari, V., Srinivasa Rao, S., Raju, K.V.S.V.N., Ramana, K.V., Avadhani, B.V.S.: Fuzzy based approach for privacy preserving publication of data. IJCSNS International Journal of Computer Science and Network Security 8(1), 115–121 (2008)
Google Scholar
Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-based outliers: algorithms and applications. VLDB Journal 8(3-4), 237–253 (2000)
Article Google Scholar
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: SIGMOD (2000)
Google Scholar
Breunig, M.M., Kriegel, H., Ng, R.T., Sander, J.: LOF: Identifying Density-Based Local Outliers. In: SIGMOD (2000)
Google Scholar
Barnett, V., Lewis, T.: Outliers in Statistic Data. John Wiley’s Publisher, NY (1994)
MATH Google Scholar
Arning, A., Agrawal, R., Raghavan, P.: A Linear Method for Deviation Detection in Large Databases. In: 2nd International Conference on Knowledge Discovery and Data Mining Proceedings, pp. 164–169 (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Systems Engineering, Andhra University, Visakhapatnam, Andhra Pradesh, India, 5300 03
K. Venkata Ramana, V. Valli Kumari & K. V. S. V. N. Raju

Authors

K. Venkata Ramana
View author publications
You can also search for this author in PubMed Google Scholar
V. Valli Kumari
View author publications
You can also search for this author in PubMed Google Scholar
K. V. S. V. N. Raju
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Wireilla Net Solutions PTY Ltd, Melbourne, Victoria, Australia
Dhinaharan Nagamalai
Institut Telecom/Telecom SudParis, 9, rue Charles Fourier, 91011, Evry Cedex, France
Eric Renault
Manonmaniam Sundaranar University, Thirunelveli, Tamil Nadu, India
Murugan Dhanuskodi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Venkata Ramana, K., Valli Kumari, V., Raju, K.V.S.V.N. (2011). Impact of Outliers on Anonymized Categorical Data. In: Nagamalai, D., Renault, E., Dhanuskodi, M. (eds) Advances in Digital Image Processing and Information Technology. DPPR 2011. Communications in Computer and Information Science, vol 205. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24055-3_33

Download citation

DOI: https://doi.org/10.1007/978-3-642-24055-3_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24054-6
Online ISBN: 978-3-642-24055-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics