Classification Learning from Private Data in Heterogeneous Settings

Nie, Yiwen; Wang, Shaowei; Yang, Wei; Huang, Liusheng; Zhao, Zhenhua

doi:10.1007/978-3-319-91458-9_35

Yiwen Nie²⁴,
Shaowei Wang²⁴,
Wei Yang²⁴,
Liusheng Huang²⁴ &
…
Zhenhua Zhao²⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10828))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

3624 Accesses
1 Citations

Abstract

Classification is useful for mining labels of data. Though well-trained classifiers benefit many applications, their training procedures on user-contributed data may leak users’ privacy.

This work studies methods for private model training in heterogeneous settings, specially for the Naïve Bayes Classifier (NBC). Unlike previous works focusing on centralized and consistent datasets, we consider the private training in two more practical settings, namely the local setting and the mixture setting. In the local setting, individuals directly contribute training tuples to the untrusted trainer. In the mixture setting, the training dataset is composed of individual tuples and statistics of datasets from institutes. We propose a randomized response based NBC strategy for the local setting. To cope with the privacy of heterogeneous data (single tuples and the statistics) in the mixture setting, we design a unified privatized scheme. It integrates respective sanitization strategies on the two data types while preserving privacy. Besides contributing error bounds of estimated probabilities constituting NBC, we prove their optimality in the minimax framework and quantify the classification error of the privately learned NBC. Our analyses are validated with extensive experiments on real-world datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bassily, R., Smith, A.: Local, private, efficient protocols for succinct histograms. In: STOC (2015)
Google Scholar
Chen, R., Li, H., Qin, A., Kasiviswanathan, S.P., Jin, H.: Private spatial data aggregation in the local setting. In: ICDE (2016)
Google Scholar
Duchi, J., Wainwright, M., Jordan, M.: Minimax optimal procedures for locally private estimation. arXiv preprint (2016)
Google Scholar
Dwork, C.: Differential privacy. In: ICALP (2006)
Google Scholar
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14
Chapter Google Scholar
Erlingsson, Ú., Pihur, V., Korolova, A.: RAPPOR: randomized aggregatable privacy-preserving ordinal response. In: CCS (2014)
Google Scholar
Ghosh, A., Roughgarden, T., Sundararajan, M.: Universally utility-maximizing privacy mechanisms. SICOMP 41, 1673–1693 (2012)
Article MathSciNet Google Scholar
Goldreich, O.: Secure multi-party computation (1998)
Google Scholar
Hong, Y., Vaidya, J., Lu, H., Karras, P., Goel, S.: Collaborative search log sanitization: toward differential privacy and boosted utility. TDSC 12, 504–518 (2015)
Google Scholar
Kasiviswanathan, S.P., Lee, H.K., Nissim, K., Raskhodnikova, S., Smith, A.: What can we learn privately? SICOMP 40, 793–826 (2011)
Article MathSciNet Google Scholar
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Mohammed, N., Alhadidi, D., Fung, B.C., Debbabi, M.: Secure two-party differentially private data release for vertically partitioned data. TDSC 11, 59–71 (2014)
Google Scholar
Moro, S., Cortez, P., Rita, P.: A data-driven approach to predict the success of bank telemarketing. Decis. Support Syst. 62, 22–31 (2014)
Article Google Scholar
Shokri, R., Shmatikov, V.: Privacy-preserving deep learning. In: CCS (2015)
Google Scholar
To, H., Ghinita, G., Shahabi, C.: A framework for protecting worker location privacy in spatial crowdsourcing. VLDB 7, 919–930 (2014)
Google Scholar
Vaidya, J., Clifton, C.: Privacy-preserving decision trees over vertically partitioned data. In: Jajodia, S., Wijesekera, D. (eds.) DBSec 2005. LNCS, vol. 3654, pp. 139–152. Springer, Heidelberg (2005). https://doi.org/10.1007/11535706_11
Chapter MATH Google Scholar
Yang, Z., Zhong, S., Wright, R.N.: Privacy-preserving classification of customer data without loss of accuracy. In: SDM (2005)
Google Scholar
Zhang, P., Tong, Y., Tang, S., Yang, D.: Privacy preserving Naive Bayes classification. In: Li, X., Wang, S., Dong, Z.Y. (eds.) ADMA 2005. LNCS (LNAI), vol. 3584, pp. 744–752. Springer, Heidelberg (2005). https://doi.org/10.1007/11527503_88
Chapter Google Scholar

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 61572456), the Anhui Province Guidance Funds for Quantum Communication and Quantum Computers and the Natural Science Foundation of Jiangsu Province of China (No. BK20151241).

Author information

Authors and Affiliations

University of Science and Technology of China, Hefei, China
Yiwen Nie, Shaowei Wang, Wei Yang, Liusheng Huang & Zhenhua Zhao

Authors

Yiwen Nie
View author publications
You can also search for this author in PubMed Google Scholar
Shaowei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Yang
View author publications
You can also search for this author in PubMed Google Scholar
Liusheng Huang
View author publications
You can also search for this author in PubMed Google Scholar
Zhenhua Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yiwen Nie .

Editor information

Editors and Affiliations

Simon Fraser University, Burnaby, BC, Canada
Jian Pei
Aristotle University of Thessaloniki, Thessaloniki, Greece
Yannis Manolopoulos
University of Queensland, Brisbane, QLD, Australia
Shazia Sadiq
University of Western Australia, Crawley, WA, Australia
Jianxin Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nie, Y., Wang, S., Yang, W., Huang, L., Zhao, Z. (2018). Classification Learning from Private Data in Heterogeneous Settings. In: Pei, J., Manolopoulos, Y., Sadiq, S., Li, J. (eds) Database Systems for Advanced Applications. DASFAA 2018. Lecture Notes in Computer Science(), vol 10828. Springer, Cham. https://doi.org/10.1007/978-3-319-91458-9_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-91458-9_35
Published: 12 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91457-2
Online ISBN: 978-3-319-91458-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics