Abstract
It is well recognised that data mining and statistical analysis pose a serious treat to privacy. This is true for financial, medical, criminal and marketing research. Numerous techniques have been proposed to protect privacy, including restriction and data modification. Recently proposed privacy models such as differential privacy and k-anonymity received a lot of attention and for the latter there are now several improvements of the original scheme, each removing some security shortcomings of the previous one. However, the challenge lies in evaluating and comparing privacy provided by various techniques. In this paper we propose a novel entropy based security measure that can be applied to any generalisation, restriction or data modification technique. We use our measure to empirically evaluate and compare a few popular methods, namely query restriction, sampling and noise addition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adam, N.R., Worthmann, J.C.: Security-control methods for statistical databases: a comparative study. ACM Comput. Surv. 21(4), 515–556 (1989)
Ahlswede, R., Aydinian, H.: On security of statistical databases. SIAM J. Discrete Math. 25(4), 1778–1791 (2011)
Al-Saggaf, Y., Islam, M.Z.: Privacy in social network sites (SNS) - the threats from data mining. Ethical Space: J. Commun. Ethics 9(4), 32–40 (2012)
Al-Saggaf, F., Islam, M.Z.: A malicious use of a clustering algorithm to threaten the privacy of a social networking site user. World J. Comput. Appl. Technol. 1(2), 29–34 (2013)
Al-Saggaf, Y., Islam, M.Z.: Data mining and privacy of social network sites users: implications of the data mining problem. Sci. Eng. Ethics (2014)
Blake, C.L.: Wine Recognition Data (1998)
Brankovic, L.: Usability of secure statistical databases. Ph.D. Thesis, Newcastle, Australia (1998)
Brankovic, L., Cvetkovic, D.: The eigenspace of the eigenvalue -2 in generalized line graphs and a problem in security of statistical databases. Publikacije ETF, Serija: matematika. 14, 37–48 (2003)
Brankovic, L., Estivill-Castro, V.: Privacy issues in knowledge discovery and data mining. In: Australian Institute of Computer Ethics Conference, pp. 89–99 (1999)
Brankovic, L., Giggins, H.: Statistical database security. In: Petković, M., Jonker, W. (eds.) Security, Privacy, and Trust in Modern Data Management, pp. 167–181. Springer, Heidelberg (2007)
Brankovic, L., Horak, P., Miller, M.: An optimization problem in statistical databases. SIAM J. Discrete Math. 13(3), 46–353 (2000)
Brankovic, L., Horak, P., Miller, M., Wrightson, G.: Usability of compromise-free statistical databases for range sum queries. In: 9th International Conference on Scientific and Statistical Database Management, pp. 144–154. IEEE Computer Society (1997)
Brankovic, L., Islam, M.Z., Giggins, H.: Security, privacy, and trust in modern data management. In: Petković, M., Jonker, W. (eds.) Privacy-Preserving Data Mining, pp. 151–165. Springer, Heidelberg (2007)
Brankovic, L., Lopez, N., Miller, M., Sebe, F.: Triangle randomization for social network data anonymization. Ars Math. Contemp. 7(2), 461–477 (2014)
Brankovic, L., Miller, M., Siran, J.: Graphs, 0–1 matrices, and usability of statistical databases. Congressus Numerantium 12, 169–182 (1996)
Brankovic, L., Miller, M., Siran, J.: Usability of k-compromise-free statistical databases. In: Proceedings of the 11th Australasian Workshop on Combinatorial Algorithms (AWOCA 2000), Hunter Valley, pp. 159–166 (2000)
Brankovic, L., Miller, M., Siran, J.: Range query usability of statistical databases. Int. J. Comput. Math. 79(12), 1265–1271 (2002)
Brankovic, L., Sirán, J.: 2-compromise usability in 1-dimensional statistical databases. In: Ibarra, O.H., Zhang, L. (eds.) COCOON 2002. LNCS, vol. 2387, pp. 448–455. Springer, Heidelberg (2002)
Denning, D.E.: Cryptography and Data Security. Addison-Wesley Longman Publishing Co., Inc., Boston (1982)
Duncan, G.T., Lambert, D.: Disclosure-limited data dissemination. J. Am. Stat. Assoc. 81, 10–28 (1986)
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)
Estivill-Castro, V., Brankovic, L.: Data swapping: balancing privacy against precision in mining for logic rules. In: Mohania, M., Tjoa, A.M. (eds.) DaWaK 1999. LNCS, vol. 1676, pp. 389–398. Springer, Heidelberg (1999)
Estivill-Castro, V., Brankovic, L., Dowe, D.L.: Privacy in data mining. Privacy - Law Policy Reporter 9(3), 33–35 (1999)
Fletcher, S., Islam, M.Z.: Measuring information quality for privacy preserving data mining. Int. J. Comput. Theory Eng. 7(1), 21–28 (2015)
Fuller, W.A.: Masking procedures for microdata disclosure limitation. J. Off. Stat. 9(2), 383–406 (1993)
Fung, C.C.M., Wang, K., Chen, R., Yu, P.S.: Privacy-preserving data publishing: a survey of recent developments. ACM Comput. Surv. 42(4), 14.2–14.53 (2010)
Giggins, H.: Security of genetic databases. Ph.D. Thesis, Newcastle, Australia (2009)
Giggins, H., Brankovic, L.: VICUS - a noise addition technique for categorical data. In: 10th Australasian Data Mining Conference. CRPIT, vol. 134, pp. 139–148 (2012)
Griggs, J.R.: Concentrating subset sums at k points. Bull. Inst. Comb. Appl. 20, 65–74 (1997)
Griggs, J.R.: Database security and the distribution of subset sums in \(R^m\). In: Proceedings of the International Colloquium on Combinatorics and Graph Theory (1998)
Horak, P., Brankovic, L., Miller, M.: A combinatorial problem in database security. Discrete Appl. Math. 91(1–3), 119–126 (1999)
Islam, M.Z.: Privacy preservation in data mining through noise addition. Ph.D. Thesis, Newcastle, Australia (2008)
Islam, M.Z., Barnaghi, P.M., Brankovic, L.: Measuring data quality: predictive accuracy vs. similarity of decision trees. In: 6th International Conference on Computer and Information Technology, Dhaka, Bangladesh, pp. 457–462 (2003)
Islam, M.Z., Brankovic, L.: Noise addition for protecting privacy in data mining. In: 6th Engineering Mathematics and Applications Conference, Sydney, pp. 85–90 (2003)
Islam, M.Z., Brankovic, L.: Detective: a decision tree based categorical value clustering and perturbation technique in privacy preserving data mining. In: 3rd International IEEE Conference on Industrial Informatics, Australia, pp. 701–708 (2005)
Kim, J.J.: A method for limiting disclosure in microdata based on random noise and transformation. In: Proceedings of the Section on Survey Research Methods, pp. 303–308. American Statistical Association (1986)
Kim, J.J., Winkler, W.E.: Masking microdata files. In: Proceedings of the Section on Survey Research Methods, pp. 114–119. American Statistical Association (1995)
King, T., Brankovic, L., Gillard, P.: Perspectives of Australian adults about protecting the privacy of their health information in statistical databases. Int. J. Med. Inform. 81(4), 279–289 (2012)
Lambert, D.: Measures of disclosure risk and harm. J. Off. Stat. 9, 313–331 (1993)
Li, N., Li, T., Venkatasubramanian, S.: \(t\)-closeness: privacy beyond \(k\)-anonymity and \(l\)-diversity. In: IEEE International Conference on Data Engineering (2007)
Lopez, N., Sebe, F.: Privacy preserving release of blogosphere data in the presence of search engines. Inf. Process. Manage. 49(4), 833–851 (2013)
López, N., Sebé, F.: Degree sequences of pagerank uniform graphs and digraphs with prime outdegrees. In: Lecroq, T., Mouchard, L. (eds.) IWOCA 2013. LNCS, vol. 8288, pp. 303–313. Springer, Heidelberg (2013)
Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: \(l\)-diversity: privacy beyond \(k\)-anonymity. ACM Trans. Knowl. Discov. Data. 1 (2007)
Morris, S., Cooper, J., Bomba, D., Brankovic, L., Miller, M., Pacheco, F.: Australian healthcare: a smart card for a clever country. Int. J. Biomed. Comput. 40(2), 101–105 (1995)
Oganian, A., Domingo-Ferrer, J.: A posteriori disclosure risk measure for tabular data based on conditional entropy. SORT - Stat. Oper. Res. Trans. 27(2), 175–190 (2003)
Public Use Microdata Sample (PUMS) (2006)
Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: \(k\)-anonymity and its enforcement through generalization and suppression. Technical Report SRI-CSL-98-04. SRI Computer Science Laboratory, Palo Alto, CA (1998)
Sankar, L., Rajagopalan, S.R., Poor, H.V.: Utility-privacy tradeoffs for databases: an information-theoretic approach. IEEE Trans. Inf. Forensics Secur. 9(6), 838–852 (2013). Special Issue on Privacy and Trust Management in the Cloud and Distributed Data Systems
Skinner, C.J., Elliot, M.J.: A measure of disclosure risk for microdata. J. Roy. Stat. Soc. B 64(4), 855–867 (2002)
Spruill, N.L.: Measures of Confidentiality, Statistics of Income and Related Administrative Record Research, pp. 131–136 (1982)
Sramka, M., Safavi-Naini, R., Denzinger, J., Askari, M.: A Practice-oriented framework for measuring privacy and utility in data sanitization systems. In: EDBT/ICDT2010 Workshops, Lausanne, Switzerland, pp. 315–333 (2010)
Tendick, P.: Optimal noise addition for preserving confidentiality in multivariate data. J. Stat. Plan. Inference 27, 341–353 (1991)
Trottini, M., Fienberg, S.E.: Modelling user uncertainty for disclosure risk and data utility. Int. J. Uncertain. Fuzz. Knowl. Based Sys. 10(5), 511–527 (2002)
Truta, T.M., Fotouhi, F., Barth-Jones, D.: Disclosure risk measures for the sampling disclosure control method. In: 2004 ACM symposium on Applied computing (SAC 2004), NY, USA, pp. 301–306 (2004)
Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Lecture Notes in Statistics, p. 155. Springer-Verlag, New York (2001)
Winkler, W.E.: Masking and re-identification methods for public-use microdata: overview and research problems. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 231–246. Springer, Heidelberg (2004)
Wolberg, W.H., Street, W.N., Mangasarian, O.L.: Wisc. Diag. Breast Can. (1995)
Yancey, W.E., Winkler, W.E., Creecy, R.H.: Disclosure risk assessment in perturbative microdata protection. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, p. 135. Springer, Heidelberg (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Alfalayleh, M., Brankovic, L. (2015). Quantifying Privacy: A Novel Entropy-Based Measure of Disclosure Risk. In: Jan, K., Miller, M., Froncek, D. (eds) Combinatorial Algorithms. IWOCA 2014. Lecture Notes in Computer Science(), vol 8986. Springer, Cham. https://doi.org/10.1007/978-3-319-19315-1_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-19315-1_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19314-4
Online ISBN: 978-3-319-19315-1
eBook Packages: Computer ScienceComputer Science (R0)