Abstract
In recent years, machine learning techniques are widely used in numerous applications, such as weather forecast, financial data analysis, spam filtering, and medical prediction. In the meantime, massive data generated from multiple sources further improve the performance of machine learning tools. However, data sharing from multiple sources brings privacy issues for those sources since sensitive information may be leaked in this process. In this paper, we propose a framework enabling multiple parties to collaboratively and accurately train a learning model over distributed datasets while guaranteeing the privacy of data sources. Specifically, we consider logistic regression model for data training and propose two approaches for perturbing the objective function to preserve \( \epsilon \)-differential privacy. The proposed solutions are tested on real datasets, including Bank Marketing and Credit Card Default prediction. Experimental results demonstrate that the proposed multiparty learning framework is highly efficient and accurate.
W. Du—This work was done when Wei Du was at the University of Arkansas.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
sklearn.preprocessing.LabelEncoder. http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html
Abadi, M., et al.: Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318. ACM (2016)
Bhaskar, R., Laxman, S., Smith, A., Thakurta, A.: Discovering frequent patterns in sensitive data. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 503–512. ACM (2010)
Bos, J.W., Lauter, K., Naehrig, M.: Private predictive analysis on encrypted medical data. J. Biomed. Inform. 50, 234–243 (2014)
Bouwen, R., Taillieu, T.: Multi-party collaboration as social learning for interdependence: developing relational knowing for sustainable natural resource management. J. Community Appl. Soc. Psychol. 14(3), 137–153 (2004)
Chaudhuri, K., Monteleoni, C.: Privacy-preserving logistic regression. In: Advances in Neural Information Processing Systems, pp. 289–296 (2009)
Dwork, C.: Differential privacy: a survey of results. In: Agrawal, M., Du, D., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79228-4_1
Dwork, C., Roth, A., et al.: The algorithmic foundations of differential privacy. Found. Trends® Theor. Comput. Sci. 9(3–4), 211–407 (2014)
Friedman, A., Schuster, A.: Data mining with differential privacy. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 493–502. ACM (2010)
Graepel, T., Lauter, K., Naehrig, M.: ML confidential: machine learning on encrypted data. In: Kwon, T., Lee, M.-K., Kwon, D. (eds.) ICISC 2012. LNCS, vol. 7839, pp. 1–21. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37682-5_1
Heikkilä, M., Okimoto, Y., Kaski, S., Shimizu, K., Honkela, A.: Differentially private Bayesian learning on distributed data. arXiv preprint arXiv:1703.01106 (2017)
Inan, A., Kantarcioglu, M., Bertino, E.: Using anonymized data for classification. In: 2009 IEEE 25th International Conference on Data Engineering, ICDE 2009, pp. 429–440. IEEE (2009)
Kabir, S.M., Youssef, A.M., Elhakeem, A.K.: On data distortion for privacy preserving data mining. In: 2007 Canadian Conference on Electrical and Computer Engineering, CCECE 2007, pp. 308–311. IEEE (2007)
Kotsiantis, S.B., Zaharakis, I., Pintelas, P.: Supervised machine learning: a review of classification techniques (2007)
Kutner, M.H., Nachtsheim, C., Neter, J.: Applied Linear Regression Models. McGraw-Hill/Irwin, New York (2004)
Li, H., Xiong, L., Ohno-Machado, L., Jiang, X.: Privacy preserving RBF kernel support vector machine. BioMed Res. Int. 2014, 1–10 (2014)
Liu, K., Kargupta, H., Ryan, J.: Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans. Knowl. Data Eng. 18(1), 92–106 (2006)
McSherry, F., Mironov, I.: Differentially private recommender systems: building privacy into the net. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 627–636. ACM (2009)
Moro, S., Cortez, P., Rita, P.: A data-driven approach to predict the success of bank telemarketing. Decis. Support Syst. 62, 22–31 (2014)
Ohrimenko, O., et al.: Oblivious multi-party machine learning on trusted processors. In: USENIX Security Symposium, pp. 619–636 (2016)
Pathak, M., Rane, S., Raj, B.: Multiparty differential privacy via aggregation of locally trained classifiers. In: Advances in Neural Information Processing Systems, pp. 1876–1884 (2010)
Rajkumar, A., Agarwal, S.: A differentially private stochastic gradient descent algorithm for multiparty classification. In: Artificial Intelligence and Statistics, pp. 933–941 (2012)
Rudin, W., et al.: Principles of Mathematical Analysis, vol. 3. McGraw-hill, New York (1964)
Shobana, S., Nagajothi, P.: Deriving private information from randomized dataset using data reorganization techniques. Data Min. Knowl. Eng. 4(4), 191–194 (2012)
Shokri, R., Shmatikov, V.: Privacy-preserving deep learning. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1310–1321. ACM (2015)
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Cambridge (2016)
Yeh, I.C., Lien, C.H.: The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst. Appl. 36(2), 2473–2480 (2009)
Zhang, J., Zhang, Z., Xiao, X., Yang, Y., Winslett, M.: Functional mechanism: regression analysis under differential privacy. Proc. VLDB Endow. 5(11), 1364–1375 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Du, W., Li, A., Li, Q. (2018). Privacy-Preserving Multiparty Learning for Logistic Regression. In: Beyah, R., Chang, B., Li, Y., Zhu, S. (eds) Security and Privacy in Communication Networks. SecureComm 2018. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 254. Springer, Cham. https://doi.org/10.1007/978-3-030-01701-9_30
Download citation
DOI: https://doi.org/10.1007/978-3-030-01701-9_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01700-2
Online ISBN: 978-3-030-01701-9
eBook Packages: Computer ScienceComputer Science (R0)