Abstract
When dealing with personal data, it is important for data miners to have algorithms available for discovering trends and patterns in the data without exposing people’s private information. Differential privacy offers an enforceable definition of privacy that can provide each individual in a dataset a guarantee that their personal information is no more at risk than it would be if their data was not in the dataset at all. By using mechanisms that achieve differential privacy, we propose a decision forest algorithm that uses the theory of Signal-to-Noise Ratios to automatically tune the algorithm’s parameters, and to make sure that any differentially private noise added to the results does not outweigh the true results. Our experiments demonstrate that our differentially private algorithm can achieve high prediction accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The class attribute is the attribute that the user wishes to accurately predict the value of for future records, where the value is not known.
- 2.
Our code can be found at http://csusap.csu.edu.au/zislam/, or you can email us.
References
Bache, K., Lichman, M.: UCI Machine Learning Repository (2013). http://archive.ics.uci.edu/ml/
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Breiman, L., Friedman, J., Stone, C., Olshen, R.: Classification and Regression Trees. Chapman & Hall/CRC, Boca Raton (1984)
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006)
Dwork, C., Roth, A.: The Algorithmic Foundations of Differential Privacy. Now Publishers, Hanover (2013)
Fan, W., Wang, H., Yu, P., Ma, S.: Is random model better? On its accuracy and efficiency. In: Third IEEE International Conference on Data Mining (2003)
Fletcher, S., Islam, M.Z.: A differentially private decision forest. In: Proceedings of the 13th Australasian Data Mining Conference, Sydney, Australia (2015)
Fletcher, S., Islam, M.Z.: Quality evaluation of an anonymized dataset. In: 22nd International Conference on Pattern Recognition. IEEE, Stockholm (2014)
Friedman, A., Schuster, A.: Data mining with differential privacy. In: 16th SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 493–502. ACM, Washington, DC, USA (2010)
Fung, B., Wang, K., Chen, R., Yu, P.: Privacy-preserving data publishing: a survey of recent developments. ACM Comput. Surv. (CSUR) 42(4), 14 (2010)
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2006)
Jagannathan, G., Pillaipakkamnatt, K., Wright, R.: A practical differentially private random decision tree classifier. Trans. Data Priv. 5(1), 273–295 (2012)
McSherry, F.: Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Proceedings of the 35th SIGMOD International Conference on Management of Data, pp. 19–30. ACM, Providence, USA (2009)
McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: 48th Annual IEEE Symposium on Foundations of Computer Science, pp. 94–103 (2007)
Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 10(5), 571–588 (2002)
UN General Assembly: Universal Declaration of Human Rights (1948)
Van Drongelen, W.: Signal processing for Neuroscientists: An Introduction to the Analysis of Physiological Signals. Academic Press, Burlington (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Fletcher, S., Islam, M.Z. (2015). A Differentially Private Random Decision Forest Using Reliable Signal-to-Noise Ratios. In: Pfahringer, B., Renz, J. (eds) AI 2015: Advances in Artificial Intelligence. AI 2015. Lecture Notes in Computer Science(), vol 9457. Springer, Cham. https://doi.org/10.1007/978-3-319-26350-2_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-26350-2_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26349-6
Online ISBN: 978-3-319-26350-2
eBook Packages: Computer ScienceComputer Science (R0)