Abstract
Preserving privacy is indispensable when publishing microdata with sensitive information. Anonymization principles like k-anonymity, l-diversity were developed to protect the sensitive information. An adversary with sufficient background knowledge inferring the individual’s sensitive information signifies disclosure of the microdata. None of the above mentioned principles addressed the presence of outliers. Outliers can be classified into two types viz., local and global. This paper proposes a practically feasible distance based algorithm to anonymize the local outliers. Our proposed algorithm is capable of handling both numerical and categorical data. The experimental results of our proposed approach focused to categorical data presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. International Journal on Uncertainty, Fuzziness and Knowledge-Based Systems 10(5), 571–588 (2002)
Wang, H., Liu, R.: Hiding Distinguished Ones into Crowd: Privacy-Preserving Publishing Data with Outliers. In: The 12th International Conference on Extending Database Technology (EDBT), Saint-Petersburg, Russia, March 23-26 (2009)
Sweeney, L.: k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-Based Systems 10(5), 557–570 (2002)
Estivill-Castro, V., Brankovic, L.: Data swapping: Balancing privacy against precision in mining for logic rules. In: Mohania, M., Tjoa, A.M. (eds.) DaWaK 1999. LNCS, vol. 1676, pp. 389–398. Springer, Heidelberg (1999)
Machanavajjhala, A., Gehrke, J., Kifer, D.: l-diversity: Privacy beyond k-anonymity. In: ICDE, p. 24 (2006)
Agrawal, R., Srikant, R.: Privacy preserving data mining. In: Proc. of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, Texas, pp. 439–450 (May 2000)
Wong, R.C.W., Fu, A.W.C., Wang, K., Pei, J.: Minimality attack in privacy preserving data publishing. In: VLDB, pp. 543–554 (2007)
Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases. University of California, Irvine (1998), http://www.ics.uci.edu/-mlearn/MLRepository.html
Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.: Utility-based anonymization using local recoding. In: SIGKDD (2006)
Valli Kumari, V., Srinivasa Rao, S., Raju, K.V.S.V.N., Ramana, K.V., Avadhani, B.V.S.: Fuzzy based approach for privacy preserving publication of data. IJCSNS International Journal of Computer Science and Network Security 8(1), 115–121 (2008)
Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-based outliers: algorithms and applications. VLDB Journal 8(3-4), 237–253 (2000)
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: SIGMOD (2000)
Breunig, M.M., Kriegel, H., Ng, R.T., Sander, J.: LOF: Identifying Density-Based Local Outliers. In: SIGMOD (2000)
Barnett, V., Lewis, T.: Outliers in Statistic Data. John Wiley’s Publisher, NY (1994)
Arning, A., Agrawal, R., Raghavan, P.: A Linear Method for Deviation Detection in Large Databases. In: 2nd International Conference on Knowledge Discovery and Data Mining Proceedings, pp. 164–169 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Venkata Ramana, K., Valli Kumari, V., Raju, K.V.S.V.N. (2011). Impact of Outliers on Anonymized Categorical Data. In: Nagamalai, D., Renault, E., Dhanuskodi, M. (eds) Advances in Digital Image Processing and Information Technology. DPPR 2011. Communications in Computer and Information Science, vol 205. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24055-3_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-24055-3_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24054-6
Online ISBN: 978-3-642-24055-3
eBook Packages: Computer ScienceComputer Science (R0)