A General Framework and Metrics for Longitudinal Data Anonymization
The bulk of methods in statistical disclosure control primarily deal with individual data from a cross-sectional perspective, i.e. data where individuals are observed at one single point in time. However, nowadays longitudinal data, i.e. individuals observed over multiple periods, are increasingly collected. Such data enhance undoubtedly the possibility of statistical analysis compared to cross-sectional data, but also come with some additional layers of information that have to remain practically useful in a privacy-preserving way. Building on the recently proposed permutation paradigm as an overarching approach to data anonymization, this paper establishes a general framework for the formulation of longitudinal data anonymization and proposes some universal metrics for the assessment of disclosure risk and information loss. We illustrate the application of these new tools using an empirical example.
KeywordsStatistical disclosure control Longitudinal data Permutation paradigm
- 1.Brand, R., Domingo-Ferrer, J., Mateo-Sanz, J.M.: Reference data sets to test and compare SDC methods for the protection of numerical microdata. Deliverable of the EU IST-2000-25069 “CASC” Project (2003)Google Scholar
- 7.Ruiz, N.: A general cipher for individual data anonymization, under review for Information Sciences. https://arxiv.org/abs/1712.02557 (2018)
- 8.Sehatkar, M., Matwin, S.: HALT: hybrid anonymization of longitudinal transactions. In: Eleventh Conference on Privacy, Security, Trust (PST), pp. 127–134 (2013)Google Scholar