Advertisement

International Journal of Information Security

, Volume 17, Issue 4, pp 365–377 | Cite as

Supervised machine learning using encrypted training data

  • Francisco-Javier González-Serrano
  • Adrián Amor-Martín
  • Jorge Casamayón-Antón
Regular Contribution
  • 284 Downloads

Abstract

Preservation of privacy in data mining and machine learning has emerged as an absolute prerequisite in many practical scenarios, especially when the processing of sensitive data is outsourced to an external third party. Currently, privacy preservation methods are mainly based on randomization and/or perturbation, secure multiparty computations and cryptographic methods. In this paper, we take advantage of the partial homomorphic property of some cryptosystems to train simple machine learning models with encrypted data. Our basic scenario has three parties: multiple Data Owners, which provide encrypted training examples; the Algorithm Owner (or Application), which processes them to adjust the parameters of its models; and a semi-trusted third party, which provides privacy and secure computation services to the Application in some operations not supported by the homomorphic cryptosystem. In particular, we focus on two issues: the use of multiple-key cryptosystems, and the impact of the quantization of real-valued input data required before encryption. In addition, we develop primitives based on the outsourcing of a reduced set of operations that allows to implement general machine learning algorithms using efficient dedicated hardware. As applications, we consider the training of classifiers using privacy-protected data and the tracking of a moving target using encrypted distance measurements.

Keywords

Classification Security integrity protection Machine learning Privacy protection Homomorphic encryption 

References

  1. 1.
    Agrawal, R., Srikant, R.: Privacy-preserving data mining. ACM SIGMOD Rec. 29(2), 439–450 (2000). doi: 10.1145/335191.335438 CrossRefGoogle Scholar
  2. 2.
    Bache, K., Lichman, M.: UCI machine learning repository. http://archive.ics.uci.edu/ml (2013)
  3. 3.
    Bar-Shalom, Y., Li, X.R., Kirubarajan, T.: Estimation with Applications to Tracking and Navigation: Theory Algorithms and Software. Wiley, Hoboken (2004)Google Scholar
  4. 4.
    Beye, M., Erkin, Z., Lagendijk, R.: Efficient privacy preserving k-means clustering in a three-party setting. In: Information Forensics and Security (WIFS), 2011 IEEE International Workshop on, pp. 1–6. doi: 10.1109/WIFS.2011.6123148 (2011)
  5. 5.
    Bianchi, T., Piva, A., Barni, M.: Efficient linear filtering of encrypted signals via composite representation. In: Digital Signal Processing, 2009 16th International Conference on, pp. 1–6. doi: 10.1109/ICDSP.2009.5201116 (2009)
  6. 6.
    Bossuet, L., Grand, M., Gaspar, L., Fischer, V., Gogniat, G.: Architectures of flexible symmetric key crypto engines survey: from hardware coprocessor to multi-crypto-processor system on chip. ACM Comput. Surv. (CSUR) 45(4), 41 (2013)CrossRefGoogle Scholar
  7. 7.
    Bost, R., Popa, R.A., Tu, S., Goldwasser, S.: Machine learning classification over encrypted data. Cryptology ePrint Archive, Rep. 2014/331. http://eprint.iacr.org/ (2014)
  8. 8.
    Bresson, E., Catalano, D., Pointcheval, D.: A simple public-key cryptosystem with a double trapdoor decryption mechanism and its applications. In: Advances in Cryptology-ASIACRYPT 2003, Springer, pp. 37–54 (2003)Google Scholar
  9. 9.
    Catrina, O., Saxena, A.: Secure computation with fixed-point numbers. In: Financial Cryptography and Data Security, Springer, pp. 35–50 (2010)Google Scholar
  10. 10.
    Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)MATHGoogle Scholar
  11. 11.
    Diffie, W., Hellman, M.E.: New directions in cryptography. IEEE Trans. Infor. Theory 22(6), 644–654 (1976)MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
    ElGamal, T.: A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE Trans. Inf. Theory 31(4), 469–472 (1985). doi: 10.1109/TIT.1985.1057074 MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Erkin, Z., Veugen, T., Toft, T., Lagendijk, R.: Generating private recommendations efficiently using homomorphic encryption and data packing. IEEE Trans. Inf. For. Secur. 7(3), 1053–1066 (2012). doi: 10.1109/TIFS.2012.2190726 CrossRefGoogle Scholar
  14. 14.
    Gentry, C.: Computing arbitrary functions of encrypted data. Commun. ACM 53(3), 97–105 (2010). doi: 10.1145/1666420.1666444 CrossRefMATHGoogle Scholar
  15. 15.
    Goldreich, O.: Foundations of Cryptography II. Cambridge University Press, Cambridge (2004)CrossRefMATHGoogle Scholar
  16. 16.
    González-Serrano, F., Amor-Martín, A., Casamayón-Antón, J.: State estimation using an extended Kalman filter with privacy-protected observed inputs. In: GlobalSIP14-Workshop on Information Forensics and Security 2014. Proceedings of the, pp. 1647–1652 (2014)Google Scholar
  17. 17.
    Han, S., Ng, W.K., Yu, P.S.: Privacy-preserving singular value decomposition. In: Data Engineering, 2009. ICDE’09. IEEE 25th International Conference on, IEEE, pp. 1267–1270 (2009)Google Scholar
  18. 18.
    Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: a new learning scheme of feedforward neural networks. In: Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on, vol 2, pp. 985–990 vol.2. doi: 10.1109/IJCNN.2004.1380068 (2004)
  19. 19.
    Huang, J., Ling, C.X.: Using auc and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 17(3), 299–310 (2005)CrossRefGoogle Scholar
  20. 20.
    Huang, Z., Du, W., Chen, B.: Deriving private information from randomized data. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, ACM, pp. 37–48 (2005)Google Scholar
  21. 21.
    Instruments, T.: The TMS320 Family of Digital Signal Processors. Literature number spra396 http://www.ti.com/lit/an/spra396/spra396.pdf (1997)
  22. 22.
    Kwon, T.W., You, C.S., Heo, W.S., Kang, Y.K., Choi, J.R.: Two implementation methods of a 1024-bit RSA cryptoprocessor based on modified Montgomery algorithm. In: Circuits and Systems, 2001. ISCAS 2001. The 2001 IEEE International Symposium on, IEEE, vol 4, pp. 650–653 (2001)Google Scholar
  23. 23.
    Lagendijk, R., Erkin, Z., Barni, M.: Encrypted signal processing for privacy protection: conveying the utility of homomorphic encryption and multiparty computation. IEEE Signal Process. Mag. 30(1), 82–105 (2013). doi: 10.1109/MSP.2012.2219653 CrossRefGoogle Scholar
  24. 24.
    Nikolaenko, V., Weinsberg, U., Ioannidis, S., Joye, M., Boneh, D., Taft, N.: Privacy-preserving ridge regression on hundreds of millions of records. In: Security and Privacy (SP), 2013 IEEE Symposium on, pp. 334–348. doi: 10.1109/SP.2013.30 (2013)
  25. 25.
    Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Proceedings of the International Conference on the Theory and Application of Cryptographic Techniques, Springer, Prague, Czech Republic, EUROCRYPT ’99, vol 1592, pp. 223–238 (1999)Google Scholar
  26. 26.
    Peter, A., Tews, E., Katzenbeisser, S.: Efficiently outsourcing multiparty computation under multiple keys. IEEE Trans. Inf. For. Secur. 8(12), 2046–2058 (2013). doi: 10.1109/TIFS.2013.2288131 CrossRefGoogle Scholar
  27. 27.
    Rivest, R.L., Adleman, L., Dertouzos, M.L.: On data banks and privacy homomorphisms. Found. Secure Comput. 32(4), 169–178 (1978)MathSciNetGoogle Scholar
  28. 28.
    Samet, S., Miri, A.: Privacy-preserving back-propagation and extreme learning machine algorithms. Data Knowl. Eng. 79, 40–61 (2012)CrossRefGoogle Scholar
  29. 29.
    Shalev-Shwartz, S., Singer, Y., Srebro, N., Cotter, A.: Pegasos: primal estimated sub-gradient solver for svm. Math. Programm. 127(1), 3–30 (2011)MathSciNetCrossRefMATHGoogle Scholar
  30. 30.
    Troncoso-Pastoriza, J., Perez-Gonzalez, F.: Secure signal processing in the cloud: enabling technologies for privacy-preserving multimedia cloud processing. IEEE Signal Process. Mag. 30(2), 29–41 (2013). doi: 10.1109/MSP.2012.2228533 CrossRefGoogle Scholar
  31. 31.
    Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp. 206–215 (2003)Google Scholar
  32. 32.
    Vaidya, J., Yu, H., Jiang, X.: Privacy-preserving svm classification. Knowl. Inf. Syst. 14(2), 161–178 (2008)CrossRefGoogle Scholar
  33. 33.
    Vapnik, V.N.: Statistical Learning Theory, 1st edn. Wiley, Hoboken (1998). (September 30, 1998)MATHGoogle Scholar
  34. 34.
    Veugen, T.: Comparing encrypted data. In: Technical Report, Multimedia Signal Processing Group, Delft University of Technology, The Netherlands, and TNO Information and Communication Technology, Delft, The Netherlands (2011)Google Scholar
  35. 35.
    Yao, A.C.: Protocols for secure computations. In: Foundations of Computer Science, 1982. SFCS ’08. 23rd Annual Symposium on, pp. 160–164. doi: 10.1109/SFCS.1982.38 (1982)

Copyright information

© Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  1. 1.Department of Signal Theory and CommunicationsUniversidad Carlos III de MadridLeganésSpain

Personalised recommendations