Abstract
We consider the problem of securing a statistical database by utilizing the well-known micro-aggregation strategy, and in particular, the k-Ward strategy introduced in [1] and utilized in [2]. The latter scheme, which represents the state-of-the-art, coalesces the sorted data attribute values into groups, and on being queried, reports the means of the corresponding groups. We demonstrate that such a scheme can be optimized on two fronts. First of all, we minimize the computations done in evaluating the between-class distance matrix, to require only a constant number of updating distance computations. Secondly, and more importantly, we propose that the data set be partitioned recursively before a k-Ward strategy is invoked, and that the latter be invoked on the “primitive” sub-groups which terminate the recursion. Our experimental results, done on two benchmark data sets, demonstrate a marked improvement. While the information loss is comparable to the k-Ward micro-aggregation technique proposed by Domingo-Ferrer et.al. [2], the computations required to achieve this loss is a fraction of the computations required in the latter – providing a computational advantage which sometimes exceeds 80% if one method is used by itself, and more than 90% if both enhancements are invoked simultaneously.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ward, J.H.: Hierarchical grouping to optimize an objective function. J. American Statistical Association 58, 236–245 (1963)
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering 14, 189–201 (2002)
Adam, N.R., Wortmann, J.C.: Security-control methods for statistical databases: A comparative study. ACM Computing Surveys 21, 515–556 (1989)
Baeyens, Y., Defays, D.: Estimation of variance loss following microaggregation by the individual ranking method. In: Proceedings of Statistical Data Protection 1998, pp. 101–108. Office for Official Publications of the Eur. Comm., Luxembourg (1999)
Cuppen, M.: Source Data Perturbation in Statistical Disclosure Control. PhD thesis, Statistics Netherlands (2000)
Mateo-Sanz, J.M., Domingo-Ferrer, J.: A method for data-oriented multivariate microaggregation. In: Proceedings of Statistical Data Protection 1998, pp. 89–99. Office for Official Publications of the European Communities, Luxembourg (1999)
Hansen, S.L., Mukherjee, S.: A polynomial algorithm for univariate optimal microaggregation. IEEE Trans. on Know. and Data Eng. 15, 1043–1044 (2003)
Laszlo, M., Mukherjee, S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans. on Know. and Data Eng. 17, 902–911 (2005)
Mateo-Sanz, J.M., Domingo-Ferrer, J.: A comparative study of microaggregation methods. Questiio 22, 511–526 (1998)
Solanas, A., Martínez-Ballesté, A., Domingo-Ferrer, J., Mateo-Sanz, J.: A 2d-tree-based blocking method for microaggregating very large data sets. In: The First International Conference on Availability, Reliability and Security (2006)
Defays, D., Nanopoulos, P.: Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of 92 Symposium on Design and Analysis of Longitudinal Surveys, pp. 195–204. Statistics Canada, Ottawa (1993)
Defays, D., Anwar, N.: Micro-aggregation: A generic method. In: Proceedings of the 2nd International Symposium on Statistical Confidentiality, pp. 69–78. Office for Official Publications of the European Communities, Luxembourg (1995)
Solanas, A., Martínez-Ballesté, A.: V-mdav: A multivariate microaggregation with variable group size. In: 17th COMPSTAT Symposium of the IASC, Rome (2006)
Li, Y., Zhu, S., Wang, L., Jajodia, S.: A privacy-enhanced microaggregation method. In: Eiter, T., Schewe, K.-D. (eds.) FoIKS 2002. LNCS, vol. 2284, pp. 148–159. Springer, Heidelberg (2002)
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Resampling for statistical confidentiality in contingency tables. Comp. and Math. with App. 38, 13–32 (1999)
Fayyoumi, E., Oommen, B.J.: (Enhancing k-ward micro-aggregation for secure statistical databases using distance-based and recursive optimizations) Unabridged Version of This Paper
Brucker, P.: On the complexity of clustering problems. In: Hehn, R., Korte, B., Oettli, W. (eds.) Optimization and Operations Research, pp. 45–54 (1977)
Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 113–134. Springer, Berlin (2002)
Brand, R., Domingo-Ferrer, J., Mateo-Sanz, J.M.: Reference data sets to test and compare SDC methods for protection of numerical microdata. Technical report, CASC PROJECT, Computational Aspects of Statistical Confidentiality (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fayyoumi, E., Oommen, B.J. (2006). On Optimizing the k-Ward Micro-aggregation Technique for Secure Statistical Databases. In: Batten, L.M., Safavi-Naini, R. (eds) Information Security and Privacy. ACISP 2006. Lecture Notes in Computer Science, vol 4058. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11780656_27
Download citation
DOI: https://doi.org/10.1007/11780656_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35458-1
Online ISBN: 978-3-540-35459-8
eBook Packages: Computer ScienceComputer Science (R0)