Abstract
The recent proliferation of data mining tools for the analysis of large volumes of data has paid little attention to individual privacy issues. Here, we introduce methods aimed at finding a balance between the individuals’ right to privacy and the data-miners’ need to find general patterns in huge volumes of detailed records. In particular, we focus on the data-mining task of classification with decision trees. We base our security-control mechanism on noise-addition techniques used in statis tical databases because (1) the multidimensional matrix model of statistical databases and the multidimensional cubes of On-Line Analytical Processing (OLAP) are essentially the same, and (2) noise-addition techniques are very robust. The main drawback of noise addition techniques in the context of statistical databases is low statistical quality of released statistics. We argue that in data mining the major requirement of security control mechanism (in addition to protect privacy) is not to ensure precise and bias-free statistics, but rather to preserve the high-level descriptions of knowledge constructed by artificial data mining tools.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
N. R. Adam and J. C. Wortmann. Security-control methods for Statistical databases: A comparative study. ACM Computing Surveys, 21(4):515–556, 1989
M.J.A. Berry and G. Linoff. Data Mining Techniques-for Marketing, Sales and Customer Support. John Wiley & Sons, NY. USA, 1997.
A. Berson and S.J. Smith. Data Warehousing, Data Mining, & OLAP. Series on Data Warehousing and Data Management. McGraw-Hill, NY, USA, 1998.
L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification and Regression Trees. Wadsworth and Brooks, Monterrey, CA, 1984.
P. Clark and T. Niblett. The CN2 induction algorithm. Machine Learning, 3(4):261–283, 1989
C. Clifton. Protecting against data mining through samples. In Thirteenth Annual IFIP WG 11.3 Working Conference on Database Security, Seatle, WA, July 1999.
C. Clifton and D. Marks. Security and privacy implications of data mining. In SIGMOD Workshop on Data Mining and Knowledge Discovery, Montreal, Canada, June 1996. ACM.
D. E. R. Denning. Cryptography and Data Security. Addison-Wesley, 1982.
V. Estivill-Castro. Collaborative knowledge acquisition with a genetic algorithm. In Proceedings of the IEEE International Conference on Tools with Artificial Intelligence (ICTAI-97), pages 270–277. IEEE Press, 1997.
K. C. Laudon. Markets and privacy. Communications of the ACM, 39(9):92–104, 1996.
C. K. Liew, U. J. Choi, and C.J. Liew. Inference control mechanism for statistical database: Frequency-imposed data distortions. Journal of the American Society for Information Science,36(6):322–329, 1985.
O.L. Mangasarian and W.H. Wolberg. Cancer diagnosis via linear programming. SIAM News, 23(5):1–18, September 1990.
M. Miller and J. Seberry. Relative compromise of statistical databases. The Austmlian Computer Journal, 21(2):56–61, 1989.
D.E. O’Leary. Knowledge discovery as a threat to database security. In G. Piatetsky-Shapko and W. J. Frawley, editors, Knowledge Discovery in Databases, pages 507–516, Menlo Park, CA, 1991. AAAI Press.
D.E. O’Leary. Some privacy issues in knowledge discovery: the OECD personal privacy guidelines. IEEE Expert, 10(2):48–52, April 1995.
P. R. Peacock. Data mining in marketing: Part 2. Marketing Management, 7(1):15–25, 1998.
G. Piatetsky-Shapiro. Knowledge discovery in personal data vs privacy: a minisymposium. IEEE Expert, 10(2):46–47, April 1995.
J.R. Quinlan. Induction of decision trees. Machine Learning Journal, 1:81–106, 1986.
J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Maufmann Publishers, San Mateo, CA, 1993.
S. Reiss. Practical data-swapping: The first step. ACM Transaction on Database Systems, 9(1):20–37, 1984.
A. Shoshani. OLAP and statistical databases: similarities and differences. In Proceedings of the Sixteenth ACM SIGA CT SIGMOD SIGA RT Symposium of Principles of Database Systems, pages 185–196, Tucsom, AZ, US, 1997. PODS, ACM.
C.S. Wallace and J.D. Patrick. Coding decision trees. Muchine Learning, 11:7–22, 1993.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Estivill-Castro, V., Brankovic, L. (1999). Data Swapping: Balancing Privacy against Precision in Mining for Logic Rules. In: Mohania, M., Tjoa, A.M. (eds) DataWarehousing and Knowledge Discovery. DaWaK 1999. Lecture Notes in Computer Science, vol 1676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48298-9_41
Download citation
DOI: https://doi.org/10.1007/3-540-48298-9_41
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66458-1
Online ISBN: 978-3-540-48298-7
eBook Packages: Springer Book Archive