International Workshop on Combinatorial Algorithms

IWOCA 2014: Combinatorial Algorithms pp 24-36 | Cite as

Quantifying Privacy: A Novel Entropy-Based Measure of Disclosure Risk

  • Mousa Alfalayleh
  • Ljiljana BrankovicEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8986)


It is well recognised that data mining and statistical analysis pose a serious treat to privacy. This is true for financial, medical, criminal and marketing research. Numerous techniques have been proposed to protect privacy, including restriction and data modification. Recently proposed privacy models such as differential privacy and k-anonymity received a lot of attention and for the latter there are now several improvements of the original scheme, each removing some security shortcomings of the previous one. However, the challenge lies in evaluating and comparing privacy provided by various techniques. In this paper we propose a novel entropy based security measure that can be applied to any generalisation, restriction or data modification technique. We use our measure to empirically evaluate and compare a few popular methods, namely query restriction, sampling and noise addition.


Range Query Security Measure Differential Privacy Disclosure Risk Privacy Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Adam, N.R., Worthmann, J.C.: Security-control methods for statistical databases: a comparative study. ACM Comput. Surv. 21(4), 515–556 (1989)CrossRefGoogle Scholar
  2. 2.
    Ahlswede, R., Aydinian, H.: On security of statistical databases. SIAM J. Discrete Math. 25(4), 1778–1791 (2011)zbMATHMathSciNetCrossRefGoogle Scholar
  3. 3.
    Al-Saggaf, Y., Islam, M.Z.: Privacy in social network sites (SNS) - the threats from data mining. Ethical Space: J. Commun. Ethics 9(4), 32–40 (2012)Google Scholar
  4. 4.
    Al-Saggaf, F., Islam, M.Z.: A malicious use of a clustering algorithm to threaten the privacy of a social networking site user. World J. Comput. Appl. Technol. 1(2), 29–34 (2013)Google Scholar
  5. 5.
    Al-Saggaf, Y., Islam, M.Z.: Data mining and privacy of social network sites users: implications of the data mining problem. Sci. Eng. Ethics (2014)Google Scholar
  6. 6.
    Blake, C.L.: Wine Recognition Data (1998)Google Scholar
  7. 7.
    Brankovic, L.: Usability of secure statistical databases. Ph.D. Thesis, Newcastle, Australia (1998)Google Scholar
  8. 8.
    Brankovic, L., Cvetkovic, D.: The eigenspace of the eigenvalue -2 in generalized line graphs and a problem in security of statistical databases. Publikacije ETF, Serija: matematika. 14, 37–48 (2003)zbMATHMathSciNetCrossRefGoogle Scholar
  9. 9.
    Brankovic, L., Estivill-Castro, V.: Privacy issues in knowledge discovery and data mining. In: Australian Institute of Computer Ethics Conference, pp. 89–99 (1999)Google Scholar
  10. 10.
    Brankovic, L., Giggins, H.: Statistical database security. In: Petković, M., Jonker, W. (eds.) Security, Privacy, and Trust in Modern Data Management, pp. 167–181. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  11. 11.
    Brankovic, L., Horak, P., Miller, M.: An optimization problem in statistical databases. SIAM J. Discrete Math. 13(3), 46–353 (2000)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Brankovic, L., Horak, P., Miller, M., Wrightson, G.: Usability of compromise-free statistical databases for range sum queries. In: 9th International Conference on Scientific and Statistical Database Management, pp. 144–154. IEEE Computer Society (1997)Google Scholar
  13. 13.
    Brankovic, L., Islam, M.Z., Giggins, H.: Security, privacy, and trust in modern data management. In: Petković, M., Jonker, W. (eds.) Privacy-Preserving Data Mining, pp. 151–165. Springer, Heidelberg (2007)Google Scholar
  14. 14.
    Brankovic, L., Lopez, N., Miller, M., Sebe, F.: Triangle randomization for social network data anonymization. Ars Math. Contemp. 7(2), 461–477 (2014)zbMATHMathSciNetGoogle Scholar
  15. 15.
    Brankovic, L., Miller, M., Siran, J.: Graphs, 0–1 matrices, and usability of statistical databases. Congressus Numerantium 12, 169–182 (1996)MathSciNetGoogle Scholar
  16. 16.
    Brankovic, L., Miller, M., Siran, J.: Usability of k-compromise-free statistical databases. In: Proceedings of the 11th Australasian Workshop on Combinatorial Algorithms (AWOCA 2000), Hunter Valley, pp. 159–166 (2000)Google Scholar
  17. 17.
    Brankovic, L., Miller, M., Siran, J.: Range query usability of statistical databases. Int. J. Comput. Math. 79(12), 1265–1271 (2002)zbMATHMathSciNetCrossRefGoogle Scholar
  18. 18.
    Brankovic, L., Sirán, J.: 2-compromise usability in 1-dimensional statistical databases. In: Ibarra, O.H., Zhang, L. (eds.) COCOON 2002. LNCS, vol. 2387, pp. 448–455. Springer, Heidelberg (2002) CrossRefGoogle Scholar
  19. 19.
    Denning, D.E.: Cryptography and Data Security. Addison-Wesley Longman Publishing Co., Inc., Boston (1982)zbMATHGoogle Scholar
  20. 20.
    Duncan, G.T., Lambert, D.: Disclosure-limited data dissemination. J. Am. Stat. Assoc. 81, 10–28 (1986)CrossRefGoogle Scholar
  21. 21.
    Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006) CrossRefGoogle Scholar
  22. 22.
    Estivill-Castro, V., Brankovic, L.: Data swapping: balancing privacy against precision in mining for logic rules. In: Mohania, M., Tjoa, A.M. (eds.) DaWaK 1999. LNCS, vol. 1676, pp. 389–398. Springer, Heidelberg (1999) Google Scholar
  23. 23.
    Estivill-Castro, V., Brankovic, L., Dowe, D.L.: Privacy in data mining. Privacy - Law Policy Reporter 9(3), 33–35 (1999)Google Scholar
  24. 24.
    Fletcher, S., Islam, M.Z.: Measuring information quality for privacy preserving data mining. Int. J. Comput. Theory Eng. 7(1), 21–28 (2015)CrossRefGoogle Scholar
  25. 25.
    Fuller, W.A.: Masking procedures for microdata disclosure limitation. J. Off. Stat. 9(2), 383–406 (1993)Google Scholar
  26. 26.
    Fung, C.C.M., Wang, K., Chen, R., Yu, P.S.: Privacy-preserving data publishing: a survey of recent developments. ACM Comput. Surv. 42(4), 14.2–14.53 (2010)CrossRefGoogle Scholar
  27. 27.
    Giggins, H.: Security of genetic databases. Ph.D. Thesis, Newcastle, Australia (2009)Google Scholar
  28. 28.
    Giggins, H., Brankovic, L.: VICUS - a noise addition technique for categorical data. In: 10th Australasian Data Mining Conference. CRPIT, vol. 134, pp. 139–148 (2012)Google Scholar
  29. 29.
    Griggs, J.R.: Concentrating subset sums at k points. Bull. Inst. Comb. Appl. 20, 65–74 (1997)zbMATHMathSciNetGoogle Scholar
  30. 30.
    Griggs, J.R.: Database security and the distribution of subset sums in \(R^m\). In: Proceedings of the International Colloquium on Combinatorics and Graph Theory (1998)Google Scholar
  31. 31.
    Horak, P., Brankovic, L., Miller, M.: A combinatorial problem in database security. Discrete Appl. Math. 91(1–3), 119–126 (1999)zbMATHMathSciNetCrossRefGoogle Scholar
  32. 32.
    Islam, M.Z.: Privacy preservation in data mining through noise addition. Ph.D. Thesis, Newcastle, Australia (2008)Google Scholar
  33. 33.
    Islam, M.Z., Barnaghi, P.M., Brankovic, L.: Measuring data quality: predictive accuracy vs. similarity of decision trees. In: 6th International Conference on Computer and Information Technology, Dhaka, Bangladesh, pp. 457–462 (2003)Google Scholar
  34. 34.
    Islam, M.Z., Brankovic, L.: Noise addition for protecting privacy in data mining. In: 6th Engineering Mathematics and Applications Conference, Sydney, pp. 85–90 (2003)Google Scholar
  35. 35.
    Islam, M.Z., Brankovic, L.: Detective: a decision tree based categorical value clustering and perturbation technique in privacy preserving data mining. In: 3rd International IEEE Conference on Industrial Informatics, Australia, pp. 701–708 (2005)Google Scholar
  36. 36.
    Kim, J.J.: A method for limiting disclosure in microdata based on random noise and transformation. In: Proceedings of the Section on Survey Research Methods, pp. 303–308. American Statistical Association (1986)Google Scholar
  37. 37.
    Kim, J.J., Winkler, W.E.: Masking microdata files. In: Proceedings of the Section on Survey Research Methods, pp. 114–119. American Statistical Association (1995)Google Scholar
  38. 38.
    King, T., Brankovic, L., Gillard, P.: Perspectives of Australian adults about protecting the privacy of their health information in statistical databases. Int. J. Med. Inform. 81(4), 279–289 (2012)CrossRefGoogle Scholar
  39. 39.
    Lambert, D.: Measures of disclosure risk and harm. J. Off. Stat. 9, 313–331 (1993)Google Scholar
  40. 40.
    Li, N., Li, T., Venkatasubramanian, S.: \(t\)-closeness: privacy beyond \(k\)-anonymity and \(l\)-diversity. In: IEEE International Conference on Data Engineering (2007)Google Scholar
  41. 41.
    Lopez, N., Sebe, F.: Privacy preserving release of blogosphere data in the presence of search engines. Inf. Process. Manage. 49(4), 833–851 (2013)CrossRefGoogle Scholar
  42. 42.
    López, N., Sebé, F.: Degree sequences of pagerank uniform graphs and digraphs with prime outdegrees. In: Lecroq, T., Mouchard, L. (eds.) IWOCA 2013. LNCS, vol. 8288, pp. 303–313. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  43. 43.
    Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: \(l\)-diversity: privacy beyond \(k\)-anonymity. ACM Trans. Knowl. Discov. Data. 1 (2007)Google Scholar
  44. 44.
    Morris, S., Cooper, J., Bomba, D., Brankovic, L., Miller, M., Pacheco, F.: Australian healthcare: a smart card for a clever country. Int. J. Biomed. Comput. 40(2), 101–105 (1995)CrossRefGoogle Scholar
  45. 45.
    Oganian, A., Domingo-Ferrer, J.: A posteriori disclosure risk measure for tabular data based on conditional entropy. SORT - Stat. Oper. Res. Trans. 27(2), 175–190 (2003)zbMATHMathSciNetGoogle Scholar
  46. 46.
    Public Use Microdata Sample (PUMS) (2006)Google Scholar
  47. 47.
    Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: \(k\)-anonymity and its enforcement through generalization and suppression. Technical Report SRI-CSL-98-04. SRI Computer Science Laboratory, Palo Alto, CA (1998)Google Scholar
  48. 48.
    Sankar, L., Rajagopalan, S.R., Poor, H.V.: Utility-privacy tradeoffs for databases: an information-theoretic approach. IEEE Trans. Inf. Forensics Secur. 9(6), 838–852 (2013). Special Issue on Privacy and Trust Management in the Cloud and Distributed Data SystemsCrossRefGoogle Scholar
  49. 49.
    Skinner, C.J., Elliot, M.J.: A measure of disclosure risk for microdata. J. Roy. Stat. Soc. B 64(4), 855–867 (2002)zbMATHMathSciNetCrossRefGoogle Scholar
  50. 50.
    Spruill, N.L.: Measures of Confidentiality, Statistics of Income and Related Administrative Record Research, pp. 131–136 (1982)Google Scholar
  51. 51.
    Sramka, M., Safavi-Naini, R., Denzinger, J., Askari, M.: A Practice-oriented framework for measuring privacy and utility in data sanitization systems. In: EDBT/ICDT2010 Workshops, Lausanne, Switzerland, pp. 315–333 (2010)Google Scholar
  52. 52.
    Tendick, P.: Optimal noise addition for preserving confidentiality in multivariate data. J. Stat. Plan. Inference 27, 341–353 (1991)zbMATHMathSciNetCrossRefGoogle Scholar
  53. 53.
    Trottini, M., Fienberg, S.E.: Modelling user uncertainty for disclosure risk and data utility. Int. J. Uncertain. Fuzz. Knowl. Based Sys. 10(5), 511–527 (2002)zbMATHCrossRefGoogle Scholar
  54. 54.
    Truta, T.M., Fotouhi, F., Barth-Jones, D.: Disclosure risk measures for the sampling disclosure control method. In: 2004 ACM symposium on Applied computing (SAC 2004), NY, USA, pp. 301–306 (2004)Google Scholar
  55. 55.
    Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Lecture Notes in Statistics, p. 155. Springer-Verlag, New York (2001) zbMATHCrossRefGoogle Scholar
  56. 56.
    Winkler, W.E.: Masking and re-identification methods for public-use microdata: overview and research problems. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 231–246. Springer, Heidelberg (2004) CrossRefGoogle Scholar
  57. 57.
    Wolberg, W.H., Street, W.N., Mangasarian, O.L.: Wisc. Diag. Breast Can. (1995)Google Scholar
  58. 58.
    Yancey, W.E., Winkler, W.E., Creecy, R.H.: Disclosure risk assessment in perturbative microdata protection. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, p. 135. Springer, Heidelberg (2002) CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.School of Electrical Engineering and Computer ScienceThe University of NewcastleCallaghanAustralia

Personalised recommendations