Abstract
Data mining introduces new problems in database security. The basic problem of using non-sensitive data to infer sensitive data is made more difficult by the “probabilistic” inferences possible with data mining. This paper shows how lower bounds from pattern recognition theory can be used to determine sample sizes where data mining tools cannot obtain reliable results.
The original version of this chapter was revised: The copyright line was incorrect. This has been corrected. The Erratum to this chapter is available at DOI: 10.1007/978-0-387-35508-5_22
Chapter PDF
Similar content being viewed by others
References
Chowdhury, S. D., Duncan, G. T., Krishnan, R., Roehrig, S. and Mukherjee, S. (1996). Logical vs. numerical inference on statistical databases. Proceedings of the Twenty-Ninth Hawaii International Conference on System Sciences, pp. 3–10.
Cohen, D. M., Kulikowski, C., and Berman, H. (1995). DEXTER: A system that experiments with choices of training data using expert knowledge in the domain of DNA hydration. Machine Learning, 21, pp. 81–101.
Cox, L. H. (1996). Protecting confidentiality in small population health and environmental statistics. Statistics in Medicine, 15, pp. 1895–1905.
Delugach, H. S. and Hinke, T. H. (1996). Wizard: A database inference analysis and detection system. IEEE Transactions on Knowledge and Data Engineering, 8 (1).
Denning, D. E. (1980). Secure statistical databases with random sample queries. ACM Transactions on Database Systems, 5 (3), pp. 291–315.
Devroye, L., Györfi, L., and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer-Verlag, New York.
Devroye, L. and Lugosi, G. (1995). Lower bounds in pattern recognition and learning. Pattern Recognition, 28, pp. 1011–1018.
Hinke, T. H. and Delugach, H. S. (1992). Aerie: An inference modeling and detection approach for databases. In Thuraisingham, B. and Landwehr, C., editors, Database Security, VI, Status and Prospects: Proceedings of the IFIP WG 11.3 Workshop on Database Security, pages 179–193, Vancouver, Canada. IFIP, Elsevier Science Publishers B.V. ( North-Holland ).
Hinke, T. H., Delugach, H. S., and Wolf, R. P. (1997). Protecting databases from inference attacks. Computers and Security, 16 (8), pp. 687–708.
Johnsten, T. and Raghavan, V. (1999). Impact of decision-region based classification algorithms on database security. Proceedings of the Thirteenth Annual IFIP WG 11.3 Working Conference on Database Security.
Kohonen, T. (1990). The self organizing map. IEEE Transactions on Computers, 78 (9), pp. 1464–1480.
Vapnik, V. N. (1982). Estimation of dependences based on empirical data. Springer-Verlag, New York.
Yang, J. and Honavar, V. (1998). Feature subset selection using a genetic algorithm. IEEE INTELLIGENT SYSTEMS, 13 (2), pp. 11–19.
Yip, R. and Levitt, K. (1998). The design and implementation of a data level database inference detection system. Proceedings of the Twelfth Annual IFIP WG 11.3 Working Conference on Database Security.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 IFIP International Federation for Information Processing
About this chapter
Cite this chapter
Clifton, C. (2000). Protecting Against Data Mining through Samples. In: Atluri, V., Hale, J. (eds) Research Advances in Database and Information Systems Security. IFIP — The International Federation for Information Processing, vol 43. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-35508-5_13
Download citation
DOI: https://doi.org/10.1007/978-0-387-35508-5_13
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4757-6411-6
Online ISBN: 978-0-387-35508-5
eBook Packages: Springer Book Archive