Skip to main content

Data Swapping: Balancing Privacy against Precision in Mining for Logic Rules

  • Conference paper
  • First Online:
Book cover DataWarehousing and Knowledge Discovery (DaWaK 1999)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1676))

Included in the following conference series:

Abstract

The recent proliferation of data mining tools for the analysis of large volumes of data has paid little attention to individual privacy issues. Here, we introduce methods aimed at finding a balance between the individuals’ right to privacy and the data-miners’ need to find general patterns in huge volumes of detailed records. In particular, we focus on the data-mining task of classification with decision trees. We base our security-control mechanism on noise-addition techniques used in statis tical databases because (1) the multidimensional matrix model of statistical databases and the multidimensional cubes of On-Line Analytical Processing (OLAP) are essentially the same, and (2) noise-addition techniques are very robust. The main drawback of noise addition techniques in the context of statistical databases is low statistical quality of released statistics. We argue that in data mining the major requirement of security control mechanism (in addition to protect privacy) is not to ensure precise and bias-free statistics, but rather to preserve the high-level descriptions of knowledge constructed by artificial data mining tools.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. N. R. Adam and J. C. Wortmann. Security-control methods for Statistical databases: A comparative study. ACM Computing Surveys, 21(4):515–556, 1989

    Article  Google Scholar 

  2. M.J.A. Berry and G. Linoff. Data Mining Techniques-for Marketing, Sales and Customer Support. John Wiley & Sons, NY. USA, 1997.

    Google Scholar 

  3. A. Berson and S.J. Smith. Data Warehousing, Data Mining, & OLAP. Series on Data Warehousing and Data Management. McGraw-Hill, NY, USA, 1998.

    Google Scholar 

  4. L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification and Regression Trees. Wadsworth and Brooks, Monterrey, CA, 1984.

    MATH  Google Scholar 

  5. P. Clark and T. Niblett. The CN2 induction algorithm. Machine Learning, 3(4):261–283, 1989

    Google Scholar 

  6. C. Clifton. Protecting against data mining through samples. In Thirteenth Annual IFIP WG 11.3 Working Conference on Database Security, Seatle, WA, July 1999.

    Google Scholar 

  7. C. Clifton and D. Marks. Security and privacy implications of data mining. In SIGMOD Workshop on Data Mining and Knowledge Discovery, Montreal, Canada, June 1996. ACM.

    Google Scholar 

  8. D. E. R. Denning. Cryptography and Data Security. Addison-Wesley, 1982.

    Google Scholar 

  9. V. Estivill-Castro. Collaborative knowledge acquisition with a genetic algorithm. In Proceedings of the IEEE International Conference on Tools with Artificial Intelligence (ICTAI-97), pages 270–277. IEEE Press, 1997.

    Google Scholar 

  10. K. C. Laudon. Markets and privacy. Communications of the ACM, 39(9):92–104, 1996.

    Article  Google Scholar 

  11. C. K. Liew, U. J. Choi, and C.J. Liew. Inference control mechanism for statistical database: Frequency-imposed data distortions. Journal of the American Society for Information Science,36(6):322–329, 1985.

    Article  Google Scholar 

  12. O.L. Mangasarian and W.H. Wolberg. Cancer diagnosis via linear programming. SIAM News, 23(5):1–18, September 1990.

    Google Scholar 

  13. M. Miller and J. Seberry. Relative compromise of statistical databases. The Austmlian Computer Journal, 21(2):56–61, 1989.

    Google Scholar 

  14. D.E. O’Leary. Knowledge discovery as a threat to database security. In G. Piatetsky-Shapko and W. J. Frawley, editors, Knowledge Discovery in Databases, pages 507–516, Menlo Park, CA, 1991. AAAI Press.

    Google Scholar 

  15. D.E. O’Leary. Some privacy issues in knowledge discovery: the OECD personal privacy guidelines. IEEE Expert, 10(2):48–52, April 1995.

    Article  MathSciNet  Google Scholar 

  16. P. R. Peacock. Data mining in marketing: Part 2. Marketing Management, 7(1):15–25, 1998.

    Google Scholar 

  17. G. Piatetsky-Shapiro. Knowledge discovery in personal data vs privacy: a minisymposium. IEEE Expert, 10(2):46–47, April 1995.

    Google Scholar 

  18. J.R. Quinlan. Induction of decision trees. Machine Learning Journal, 1:81–106, 1986.

    Google Scholar 

  19. J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Maufmann Publishers, San Mateo, CA, 1993.

    Google Scholar 

  20. S. Reiss. Practical data-swapping: The first step. ACM Transaction on Database Systems, 9(1):20–37, 1984.

    Article  MATH  Google Scholar 

  21. A. Shoshani. OLAP and statistical databases: similarities and differences. In Proceedings of the Sixteenth ACM SIGA CT SIGMOD SIGA RT Symposium of Principles of Database Systems, pages 185–196, Tucsom, AZ, US, 1997. PODS, ACM.

    Google Scholar 

  22. C.S. Wallace and J.D. Patrick. Coding decision trees. Muchine Learning, 11:7–22, 1993.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Estivill-Castro, V., Brankovic, L. (1999). Data Swapping: Balancing Privacy against Precision in Mining for Logic Rules. In: Mohania, M., Tjoa, A.M. (eds) DataWarehousing and Knowledge Discovery. DaWaK 1999. Lecture Notes in Computer Science, vol 1676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48298-9_41

Download citation

  • DOI: https://doi.org/10.1007/3-540-48298-9_41

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-66458-1

  • Online ISBN: 978-3-540-48298-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics