Frequent Sequence Pattern Mining with Differential Privacy
Focusing on the issue that releasing frequent sequence patterns and the corresponding true supports may reveal the individuals’ privacy when the data set contains sensitive information, a Differential Private Frequent Sequence Mining (DPFSM) algorithm was proposed. Downward closure property was used to generate a candidate set of sequence patterns, smart truncating based technique was used to sample frequent patterns in the candidate set, and geometric mechanism was utilized to perturb the true supports of each sampled pattern. In addition, to improve the usability of the results, a threshold modification method was proposed to reduce truncation error and propagation error in mining process. The theoretical analysis show that the proposed method is ε-differentially private. The experimental results demonstrate that the proposed method has lower False Negative Rate(FNR) and Relative Support Error (RSE) than that of the comparison algorithm named PFS2, thus effectively improving the accuracy of mining results.
KeywordsFrequent sequence mining Differential Privacy (DP) Privacy protection Geometric mechanism Data mining
This work was supported in part by Research Project of Hubei Provincial Department of Education (No. B2017590).
- 3.Chen, R., Fung, B.C.M., Desai, B.C., et al.: Differentially private transit data publication: a case study on the Montreal transportation system. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2012, pp. 213–221. ACM, New York (2012)Google Scholar
- 4.Chen, R., Acs, G., Castelluccia, C.: Differentially private sequential data publication via variable-length n-grams. In: Proceedings of the 7th ACM CCS Conference on Computer and Communications Security, CCS 2012, pp. 638–649. ACM, New York (2012)Google Scholar
- 5.Bonomi, L., Xiong, L.: A two-phase algorithm for mining sequential patterns with differential privacy. In: Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management, CIKM 2013, pp. 269–278. ACM, New York (2013)Google Scholar
- 6.Xu, S., Su, S., Cheng, X., et al.: Differentially private frequent sequence mining via sampling-based candidate pruning. In: Proceedings of the 31st IEEE International Conference on Data Engineering, ICDE 2015, pp. 1035–1046. IEEE Computer Society, Washington, DC (2015)Google Scholar
- 7.Bhaskar, R., Laxman, S., Smith, A., et al.: Discovering frequent patterns in sensitive data. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2010, pp. 503–512. ACM, New York (2010)Google Scholar
- 9.Zhang, X.J., Wang, M., Meng, X.F.: An accurate method for mining top-k frequent pattern under differential privacy. J. Comput. Res. Dev. 51(1), 104–114 (2014)Google Scholar
- 11.Chen, R., Mohammed, N., Fung, B.C.M., et al.: Publishing set valued data via differential privacy. Proc. VLDB Endowment 4(11), 1087–1098 (2011)Google Scholar
- 12.Lee, J., Cliftonc, W.: Top-k frequent itemsets via differentially private FP-trees. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014, pp. 930–940. ACM, New York (2014)Google Scholar
- 14.Ghosh, A., Roughgarden, T., Sundararajan, M.: Universally utility-maximizing privacy mechanisms. In: Proceedings of the 41th ACM STOC Annual Symposium on Theory of Computing, STOC 2009, pp. 351–360. ACM, New York (2009)Google Scholar
- 15.Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th Conference of Very Large Data Bases, VLDB 1994, pp. 487–499. Morgan Kaufmann Publishers, San Francisco, CA (1994)Google Scholar