Classification Learning from Private Data in Heterogeneous Settings

  • Yiwen NieEmail author
  • Shaowei Wang
  • Wei Yang
  • Liusheng Huang
  • Zhenhua Zhao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10828)


Classification is useful for mining labels of data. Though well-trained classifiers benefit many applications, their training procedures on user-contributed data may leak users’ privacy.

This work studies methods for private model training in heterogeneous settings, specially for the Naïve Bayes Classifier (NBC). Unlike previous works focusing on centralized and consistent datasets, we consider the private training in two more practical settings, namely the local setting and the mixture setting. In the local setting, individuals directly contribute training tuples to the untrusted trainer. In the mixture setting, the training dataset is composed of individual tuples and statistics of datasets from institutes. We propose a randomized response based NBC strategy for the local setting. To cope with the privacy of heterogeneous data (single tuples and the statistics) in the mixture setting, we design a unified privatized scheme. It integrates respective sanitization strategies on the two data types while preserving privacy. Besides contributing error bounds of estimated probabilities constituting NBC, we prove their optimality in the minimax framework and quantify the classification error of the privately learned NBC. Our analyses are validated with extensive experiments on real-world datasets.


Differential privacy Classification 



This work was supported by the National Natural Science Foundation of China (No. 61572456), the Anhui Province Guidance Funds for Quantum Communication and Quantum Computers and the Natural Science Foundation of Jiangsu Province of China (No. BK20151241).


  1. 1.
    Bassily, R., Smith, A.: Local, private, efficient protocols for succinct histograms. In: STOC (2015)Google Scholar
  2. 2.
    Chen, R., Li, H., Qin, A., Kasiviswanathan, S.P., Jin, H.: Private spatial data aggregation in the local setting. In: ICDE (2016)Google Scholar
  3. 3.
    Duchi, J., Wainwright, M., Jordan, M.: Minimax optimal procedures for locally private estimation. arXiv preprint (2016)Google Scholar
  4. 4.
    Dwork, C.: Differential privacy. In: ICALP (2006)Google Scholar
  5. 5.
    Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). Scholar
  6. 6.
    Erlingsson, Ú., Pihur, V., Korolova, A.: RAPPOR: randomized aggregatable privacy-preserving ordinal response. In: CCS (2014)Google Scholar
  7. 7.
    Ghosh, A., Roughgarden, T., Sundararajan, M.: Universally utility-maximizing privacy mechanisms. SICOMP 41, 1673–1693 (2012)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Goldreich, O.: Secure multi-party computation (1998)Google Scholar
  9. 9.
    Hong, Y., Vaidya, J., Lu, H., Karras, P., Goel, S.: Collaborative search log sanitization: toward differential privacy and boosted utility. TDSC 12, 504–518 (2015)Google Scholar
  10. 10.
    Kasiviswanathan, S.P., Lee, H.K., Nissim, K., Raskhodnikova, S., Smith, A.: What can we learn privately? SICOMP 40, 793–826 (2011)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Lichman, M.: UCI machine learning repository (2013).
  12. 12.
    Mohammed, N., Alhadidi, D., Fung, B.C., Debbabi, M.: Secure two-party differentially private data release for vertically partitioned data. TDSC 11, 59–71 (2014)Google Scholar
  13. 13.
    Moro, S., Cortez, P., Rita, P.: A data-driven approach to predict the success of bank telemarketing. Decis. Support Syst. 62, 22–31 (2014)CrossRefGoogle Scholar
  14. 14.
    Shokri, R., Shmatikov, V.: Privacy-preserving deep learning. In: CCS (2015)Google Scholar
  15. 15.
    To, H., Ghinita, G., Shahabi, C.: A framework for protecting worker location privacy in spatial crowdsourcing. VLDB 7, 919–930 (2014)Google Scholar
  16. 16.
    Vaidya, J., Clifton, C.: Privacy-preserving decision trees over vertically partitioned data. In: Jajodia, S., Wijesekera, D. (eds.) DBSec 2005. LNCS, vol. 3654, pp. 139–152. Springer, Heidelberg (2005). Scholar
  17. 17.
    Yang, Z., Zhong, S., Wright, R.N.: Privacy-preserving classification of customer data without loss of accuracy. In: SDM (2005)Google Scholar
  18. 18.
    Zhang, P., Tong, Y., Tang, S., Yang, D.: Privacy preserving Naive Bayes classification. In: Li, X., Wang, S., Dong, Z.Y. (eds.) ADMA 2005. LNCS (LNAI), vol. 3584, pp. 744–752. Springer, Heidelberg (2005). Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Yiwen Nie
    • 1
    Email author
  • Shaowei Wang
    • 1
  • Wei Yang
    • 1
  • Liusheng Huang
    • 1
  • Zhenhua Zhao
    • 1
  1. 1.University of Science and Technology of ChinaHefeiChina

Personalised recommendations