Advertisement

FAANST: Fast Anonymizing Algorithm for Numerical Streaming DaTa

  • Hessam Zakerzadeh
  • Sylvia L. Osborn
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6514)

Abstract

Streaming data is widely used in today’s world. Data comes from different sources in streams, and must be processed online and with minimum delay. These data streams usually contain confidential data such as customers’ purchase information, and need to be mined in order to reveal other useful information like customers’ purchase patterns. Privacy preservation throughout these processes plays a crucial role. K-anonymity is a well-known technique for preserving privacy. The principle issues in k-anonymity are data loss and running time. Although some of the existing k-anonymity techniques are able to generate anonymized data with acceptable data loss, their main drawback is that they are very time consuming, and are not applicable in a streaming context since streaming data is usually very sensitive to delay, and needs to be processed quite fast. In this paper, we propose a cluster-based k-anonymity algorithm called FAANST (Fast Anonymizing Algorithm for Numerical Streaming daTa) which can anonymize numerical streaming data quite fast, while providing an admissible data loss. We also show that FAANST can be easily extended to support data streams consisting of categorical values as well as numerical values.

Keywords

Data Stream Information Loss Data Loss Window Processing Privacy Preservation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Asuncion, A., Newman, D.J.: UCI machine learning repository (2007), http://www.ics.uci.edu/~mlearn/
  2. 2.
    Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: Proceedings of the 21st International Conference on Data Engineering, USA, pp. 217–228 (2005)Google Scholar
  3. 3.
    Cao, J., Carminati, B., Ferrari, E., Tan, K.L.: Castle: A delay-constrained scheme for ks-anonymizing data streams. In: Proc. of the 2008 IEEE 24th ICDE, pp. 1376–1378 (2008)Google Scholar
  4. 4.
    Fung, B.C.M., Wang, K., Yu, P.S.: Top-down specialization for information and privacy preservation. In: Proceedings of the 21st ICDE, USA, pp. 205–216 (2005)Google Scholar
  5. 5.
    Hundepool, A., Willenborg, L.: mu and tau-argus: Software for statistical disclosure control. In: Proceedings of Third International Seminar on Statistical Confidentiality (1996)Google Scholar
  6. 6.
    Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proc. of the Eighth ACM SIGKDD, pp. 279–288 (2002)Google Scholar
  7. 7.
    LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In: Proc. of ACM SIGMOD, pp. 49–60 (2005)Google Scholar
  8. 8.
    LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: Proc. of the 22nd ICDE, p. 25 (2006)Google Scholar
  9. 9.
    Li, J., Ooi, B.C., Wang, W.: Anonymizing streaming data for privacy protection. In: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, USA, pp. 1367–1369 (2008)Google Scholar
  10. 10.
    Li, J., Wong, R.C.w., Fu, A.W.c., Pei, J.: Achieving k-anonymity by clustering in attribute hierarchical structures. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2006. LNCS, vol. 4081, pp. 405–416. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  11. 11.
    MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Le Cam, L.M., Neyman, J. (eds.) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)Google Scholar
  12. 12.
    Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 223–228. ACM, New York (2004)CrossRefGoogle Scholar
  13. 13.
    Park, H.-S., Jun, C.-H.: A simple and fast algorithm for k-medoids clustering. Expert Syst. Appl. 36(2), 3336–3341 (2009)CrossRefGoogle Scholar
  14. 14.
    Patroumpas, K., Sellis, T.K.: Window specification over data streams. In: Grust, T., Höpfner, H., Illarramendi, A., Jablonski, S., Fischer, F., Müller, S., Patranjan, P.-L., Sattler, K.-U., Spiliopoulou, M., Wijsen, J. (eds.) EDBT 2006. LNCS, vol. 4254, pp. 445–464. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  15. 15.
    Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Trans. on Knowl. and Data Eng. 13(6), 1010–1027 (2001)CrossRefGoogle Scholar
  16. 16.
    Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report (1998)Google Scholar
  17. 17.
    Sweeney, L.: Datafly: A system for providing anonymity in medical data. In: Proc. of the IFIP TC11 WG11.3 Eleventh International Conference on Database Securty XI, pp. 356–381. Chapman & Hall, Ltd., Boca Raton (1998)Google Scholar
  18. 18.
    Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(5), 557–570 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Wang, W., Li, J., Ai, C., Li, Y.: Privacy protection on sliding window of data streams. In: Proceedings of the 2007 International Conference on Collaborative Computing: Networking, Applications and Worksharing, Washington, DC, USA, pp. 213–221 (2007)Google Scholar
  20. 20.
    Xiao, X., Tao, Y.: Personalized privacy preservation. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 229–240. ACM, New York (2006)CrossRefGoogle Scholar
  21. 21.
    Zakerzadeh, H.: Multi-degree anonymity in streaming data. Master’s thesis, The University of Western Ontario (February 2010)Google Scholar
  22. 22.
    Zhou, B., Han, Y., Pei, J., Jiang, B., Tao, Y., Jia, Y.: Continuous privacy preserving publishing of data streams. In: EDBT, pp. 648–659 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Hessam Zakerzadeh
    • 1
  • Sylvia L. Osborn
    • 1
  1. 1.Department of Computer ScienceThe University of Western OntarioCanada

Personalised recommendations