Abstract
This paper presents an investigation of differentially private analysis of distance-based outliers. Outlier detection aims to identify instances that are apparently distant from other instances. Meanwhile, the objective of differential privacy is to conceal the presence (or absence) of any particular instance. Outlier detection and privacy protection are therefore intrinsically conflicting tasks. In this paper, we present differentially private queries for counting outliers that appear in a given subspace, instead of reporting the outliers detected. Our analysis of the global sensitivity of outlier counts reveals that regular global sensitivity-based methods can make the outputs too noisy, particularly when the dimensionality of the given subspace is high. Noting that the counts of outliers are typically expected to be small compared to the number of data, we introduce a mechanism based on the smooth upper bound of the local sensitivity. This study is the first trial to ensure differential privacy for distance-based outlier analysis. The experimentally obtained results show that our method achieves better utility than global sensitivity-based methods do.
Chapter PDF
Similar content being viewed by others
References
Bao, H.T., et al.: A distributed solution for privacy preserving outlier detection. In: Proceedings of the 2011 Third International Conference on Knowledge and Systems Engineering, pp. 26–31. IEEE Computer Society (2011)
Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., Naor, M.: Our data, ourselves: privacy via distributed noise generation. In: Vaudenay, S. (ed.) EUROCRYPT 2006. LNCS, vol. 4004, pp. 486–503. Springer, Heidelberg (2006)
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006)
Dwork, C., Smith, A.: Differential privacy for statistics: What we know and what we want to learn. Journal of Privacy and Confidentiality 1(2), 2 (2010)
Fan, L., Xiong, L.: Differentially private anomaly detection with a case study on epidemic outbreak detection. In: Proceedings of the 2013 IEEE 13th International Conference on Data Mining Workshops, pp. 833–840. IEEE Computer Society (2013)
Fischer, K., Gärtner, B., Kutz, M.: Fast smallest-enclosing-ball computation in high dimensions. In: Di Battista, G., Zwick, U. (eds.) ESA 2003. LNCS, vol. 2832, pp. 630–641. Springer, Heidelberg (2003)
Keller, F., Müller, E., Böhm, K.: Hics: high contrast subspaces for density-based outlier ranking. In: IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, USA (Arlington, Virginia), 1–5 April, 2012, pp. 1037–1048. IEEE Computer Society (2012)
Keller, F., Müller, E., Wixler, A., Böhm, K.: Flexible and adaptive subspace search for outlier analysis. In: 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013, San Francisco, CA, USA, October 27 - November 1, 2013, pp. 1381–1390. ACM (2013)
Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24rd International Conference on Very Large Data Bases. pp. 392–403. VLDB 1998, Morgan Kaufmann Publishers Inc., San Francisco, CA (1998)
Knorr, E.M., Ng, R.T.: Finding intensional knowledge of distance-based outliers. In: Proceedings of the 25th International Conference on Very Large Data Bases, pp. 211–222. VLDB 1999, Morgan Kaufmann Publishers Inc., San Francisco, CA (1999)
Kutz, M., Kaspar, F., Bernd, G.: A java library to compute the miniball of a point set. https://github.com/hbf/miniball, last Accessed Time: February 2, 2015
Li, L., Huang, L., Yang, W., Yao, X., Liu, A.: Privacy-preserving lof outlier detection. Knowledge and Information Systems 42(3), 579–597 (2015)
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Lui, E., Pass, R.: Outlier privacy. In: Dodis, Y., Nielsen, J.B. (eds.) TCC 2015, Part II. LNCS, vol. 9015, pp. 277–305. Springer, Heidelberg (2015)
Mittelmann, H.D., Vallentin, F.: High-accuracy semidefinite programming bounds for kissing numbers. Experimental Mathematics 19(2), 175–179 (2010)
Musin, O.R.: The kissing problem in three dimensions. Discrete & Computational Geometry 35(3), 375–384 (2006)
Musin, O.R.: The kissing number in four dimensions. Annals of Mathematics 168(1), 1–32 (2008)
Nissim, K., Raskhodnikova, S., Smith, A.: Smooth sensitivity and sampling in private data analysis. In: Proceedings of the Thirty-ninth Annual ACM Symposium on Theory of Computing, pp. 75–84. STOC 2007. ACM, New York (2007)
Pham, N., Pagh, R.: A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 877–885. KDD 2012. ACM, New York (2012)
Vaidya, J., Clifton, C.: Privacy-preserving outlier detection. In: The Fourth IEEE International Conference on Data Mining, pp. 233–240. IEEE Computer Society, Brighton (2004)
Xue, A., Duan, X., Ma, H., Chen, W., Ju, S.: Privacy preserving spatial outlier detection. In: Proceedings of the 9th International Conference for Young Computer Scientists, pp. 714–719. IEEE Computer Society (2008)
Zhang, K., Hutter, M., Jin, H.: A new local distance-based outlier detection approach for scattered real-world data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 813–822. Springer, Heidelberg (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Okada, R., Fukuchi, K., Sakuma, J. (2015). Differentially Private Analysis of Outliers. In: Appice, A., Rodrigues, P., Santos Costa, V., Gama, J., Jorge, A., Soares, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2015. Lecture Notes in Computer Science(), vol 9285. Springer, Cham. https://doi.org/10.1007/978-3-319-23525-7_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-23525-7_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23524-0
Online ISBN: 978-3-319-23525-7
eBook Packages: Computer ScienceComputer Science (R0)