Abstract
Transaction data are important to applications such as marketing analysis and medical studies. However, such data can contain personal information, thus must be sanitised before being used. One popular approach to protecting transaction data is set-based generalisation, where an item in a transaction is replaced by a set of items. In this paper, we study how well transaction data can be protected by this approach. More specifically, we propose de-anonymisation methods that aim to reconstruct original transaction data from its set-generalised version by analysing semantic relationship that exist among the items. Our experiments on both real and synthetic data show that set-based generalisation may not provide adequate protection for transaction data, and about 50% of the items added to the transactions during generalisation can be detected by our method with a precision greater than 80%.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: SIGMOD 2000 Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 439–450 (2000)
Anandan, B., Clifton, C.: Significance of Term Relationships on Anonymization. In: 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, pp. 253–256. IEEE (2011)
Budanitsky, A., Hirst, G.: Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures. Evaluation (1998)
Carlson, M.: A data-swapping technique for generating synthetic samples; A method for disclosure control (2000)
Cilibrasi, R.L., Vitányi, P.M.B.: The google similarity distance. In: Knowledge and Data Engineering, pp. 370–383 (2007)
Datta, A., Sharma, D., Sinha, A.: Provable de-anonymization of large datasets with sparse dimensions. In: Degano, P., Guttman, J.D. (eds.) POST 2012. LNCS, vol. 7215, pp. 229–248. Springer, Heidelberg (2012)
Ghinita, G., Tao, Y., Kalnis, P.: On the anonymization of sparse high-dimensional data. In: 2008 IEEE 24th International Conference on Data Engineering, pp. 715–724 (2008)
Giannella, C.R., Liu, K., Kargupta, H.: Breaching Euclidean distance-preserving data perturbation using few known inputs. Data & Knowledge Engineering (301) (2012)
He, Y., Naughton, J.F.: Anonymization of set-valued data via top-down, local generalization. Proceedings of the VLDB Endowment (2009)
Huang, Z., Du, W., Chen, B.: Deriving private information from randomized data. In: SIGMOD 2005 (2005)
Islam, A., Inkpen, D.: Semantic text similarity using corpus-based word similarity and string similarity. ACM Transactions on Knowledge Discovery from Data 2(2), 1–25 (2008)
Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the privacy preserving properties of random data perturbation techniques. In: Data Mining, pp. 99–106 (2003)
Kifer, D.: Attacks on privacy and deFinetti’s theorem. In: Proceedings of the 35th SIGMOD International Conference on Management of Data, SIGMOD 2009, p. 127 (2009)
Liu, J., Wang, K.: Anonymizing transaction data by integrating suppression and generalization. In: Advances in Knowledge Discovery and Data Mining, vol. 1, pp. 1–10 (2010)
Loukides, G., Gkoulalas-Divanis, A., Malin, B.: COAT: COnstraint-based anonymization of transactions. Knowledge and Information Systems (2010)
Narayanan, A., Shmatikov, V.: Robust De-anonymization of Large Sparse Datasets. In: 2008 IEEE Symposium on Security and Privacy (sp 2008), pp. 111–125 (May 2008)
Sánchez, D., Batet, M., Viejo, A.: Detecting Term Relationships to Improve Textual Document Sanitization. In: PACIS 2013 (2013)
Terrovitis, M., Mamoulis, N., Kalnis, P.: Anonymity in unstructured data. In: Very Large Data Bases (VLDB) Conference, pp. 1–21 (2008)
Xu, Y., Fung, B.C.M., Wang, K.: Publishing sensitive transactions for itemset utility. In: Eighth IEEE International Conference on Data Mining, ICDM 2008, pp. 1109 – 1114 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Ong, H., Shao, J. (2014). De-anonymising Set-Generalised Transactions Based on Semantic Relationships. In: Dang, T.K., Wagner, R., Neuhold, E., Takizawa, M., Küng, J., Thoai, N. (eds) Future Data and Security Engineering. FDSE 2014. Lecture Notes in Computer Science, vol 8860. Springer, Cham. https://doi.org/10.1007/978-3-319-12778-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-12778-1_9
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12777-4
Online ISBN: 978-3-319-12778-1
eBook Packages: Computer ScienceComputer Science (R0)