Advertisement

Frequent Sequence Pattern Mining with Differential Privacy

  • Fengli Zhou
  • Xiaoli Lin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10954)

Abstract

Focusing on the issue that releasing frequent sequence patterns and the corresponding true supports may reveal the individuals’ privacy when the data set contains sensitive information, a Differential Private Frequent Sequence Mining (DPFSM) algorithm was proposed. Downward closure property was used to generate a candidate set of sequence patterns, smart truncating based technique was used to sample frequent patterns in the candidate set, and geometric mechanism was utilized to perturb the true supports of each sampled pattern. In addition, to improve the usability of the results, a threshold modification method was proposed to reduce truncation error and propagation error in mining process. The theoretical analysis show that the proposed method is ε-differentially private. The experimental results demonstrate that the proposed method has lower False Negative Rate(FNR) and Relative Support Error (RSE) than that of the comparison algorithm named PFS2, thus effectively improving the accuracy of mining results.

Keywords

Frequent sequence mining Differential Privacy (DP) Privacy protection Geometric mechanism Data mining 

Notes

Acknowledgment

This work was supported in part by Research Project of Hubei Provincial Department of Education (No. B2017590).

References

  1. 1.
    Sweeney, L.: k-Anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl.-Based Syst. 10(5), 557–570 (2002)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Dwork, C.: Differential privacy: a survey of results. In: Agrawal, M., Du, D., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-79228-4_1CrossRefzbMATHGoogle Scholar
  3. 3.
    Chen, R., Fung, B.C.M., Desai, B.C., et al.: Differentially private transit data publication: a case study on the Montreal transportation system. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2012, pp. 213–221. ACM, New York (2012)Google Scholar
  4. 4.
    Chen, R., Acs, G., Castelluccia, C.: Differentially private sequential data publication via variable-length n-grams. In: Proceedings of the 7th ACM CCS Conference on Computer and Communications Security, CCS 2012, pp. 638–649. ACM, New York (2012)Google Scholar
  5. 5.
    Bonomi, L., Xiong, L.: A two-phase algorithm for mining sequential patterns with differential privacy. In: Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management, CIKM 2013, pp. 269–278. ACM, New York (2013)Google Scholar
  6. 6.
    Xu, S., Su, S., Cheng, X., et al.: Differentially private frequent sequence mining via sampling-based candidate pruning. In: Proceedings of the 31st IEEE International Conference on Data Engineering, ICDE 2015, pp. 1035–1046. IEEE Computer Society, Washington, DC (2015)Google Scholar
  7. 7.
    Bhaskar, R., Laxman, S., Smith, A., et al.: Discovering frequent patterns in sensitive data. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2010, pp. 503–512. ACM, New York (2010)Google Scholar
  8. 8.
    Li, N., Qardaji, W., Su, D., et al.: PrivBasis: frequent itemset mining with differential privacy. Proc. VLDB Endowment 5(11), 1340–1351 (2012)CrossRefGoogle Scholar
  9. 9.
    Zhang, X.J., Wang, M., Meng, X.F.: An accurate method for mining top-k frequent pattern under differential privacy. J. Comput. Res. Dev. 51(1), 104–114 (2014)Google Scholar
  10. 10.
    Zeng, C., Naughton, J.F., Cai, J.-Y.: On differentially private frequent itemset mining. Proc. VLDB Endowment 6(1), 25–36 (2012)CrossRefGoogle Scholar
  11. 11.
    Chen, R., Mohammed, N., Fung, B.C.M., et al.: Publishing set valued data via differential privacy. Proc. VLDB Endowment 4(11), 1087–1098 (2011)Google Scholar
  12. 12.
    Lee, J., Cliftonc, W.: Top-k frequent itemsets via differentially private FP-trees. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014, pp. 930–940. ACM, New York (2014)Google Scholar
  13. 13.
    Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006).  https://doi.org/10.1007/11681878_14CrossRefGoogle Scholar
  14. 14.
    Ghosh, A., Roughgarden, T., Sundararajan, M.: Universally utility-maximizing privacy mechanisms. In: Proceedings of the 41th ACM STOC Annual Symposium on Theory of Computing, STOC 2009, pp. 351–360. ACM, New York (2009)Google Scholar
  15. 15.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th Conference of Very Large Data Bases, VLDB 1994, pp. 487–499. Morgan Kaufmann Publishers, San Francisco, CA (1994)Google Scholar
  16. 16.
    Zhang, C., Han, J., Shou, L., et al.: Splitter: mining fine-grained sequential patterns in semantic trajectories. Proc. VLDB Endowment 7(9), 769–780 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Faculty of Information TechnologyWuhan College of Foreign Language and Foreign AffairsWuhanChina

Personalised recommendations