Data Mining and Knowledge Discovery

, Volume 28, Issue 5–6, pp 1314–1335 | Cite as

Preserving worker privacy in crowdsourcing

  • Hiroshi Kajino
  • Hiromi Arai
  • Hisashi Kashima


This paper proposes a crowdsourcing quality control method with worker-privacy preservation. Crowdsourcing allows us to outsource tasks to a number of workers. The results of tasks obtained in crowdsourcing are often low-quality due to the difference in the degree of skill. Therefore, we need quality control methods to estimate reliable results from low-quality results. In this paper, we point out privacy problems of workers in crowdsourcing. Personal information of workers can be inferred from the results provided by each worker. To formulate and to address the privacy problems, we define a worker-private quality control problem, a variation of the quality control problem that preserves privacy of workers. We propose a worker-private latent class protocol where a requester can estimate the true results with worker privacy preserved. The key ideas are decentralization of computation and introduction of secure computation. We theoretically guarantee the security of the proposed protocol and experimentally examine the computational efficiency and accuracy.


Crowdsourcing Quality control Privacy-preserving data mining  EM algorithm 



H. Kajino and H. Kashima were supported by the FIRST program.


  1. Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp 439–450Google Scholar
  2. Bernstein M, Chi EH, Chilton L, Hartmann B, Kittur A, Miller RC (2011) Crowdsourcing and human computation: systems, studies and platforms. In: Proceedings of CHI 2011 Workshop on Crowdsourcing and Human Computation, pp 53–56Google Scholar
  3. Burkhart M, Strasser M, Many D, Dimitropoulos X (2010) SEPIA: privacy-preserving aggregation of multi-domain network events and statistics. In: Proceedings of the 19th USENIX Conference on Security, pp 223–240Google Scholar
  4. Damgård I, Jurik M (2001) A Generalisation, a simplification and some applications of Paillier’s probabilistic public-key system. In: Proceedings of the 4th International Workshop on Practice and Theory in Public Key Cryptography: Public Key Cryptography, pp 119–136Google Scholar
  5. Dawid AP, Skene AM (1979) Maximum likelihood estimation of observer error-rates using the EM algorithm. J R Stat Soc Ser C 28(1):20–28Google Scholar
  6. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39(1):1–38MATHMathSciNetGoogle Scholar
  7. Ertekin S, Hirsh H, Rudin C (2012) Learning to predict the wisdom of crowds. In: Proceedings of Collective Intelligence 2012Google Scholar
  8. Lease M (2011) On quality control and machine learning in crowdsourcing. In: Proceedings of the Third Human Computation Workshop, pp 97–102Google Scholar
  9. Lin X, Clifton C, Zhu M (2005) Privacy-preserving clustering with distributed EM mixture modeling. Knowl Inf Syst 8(1):68–81CrossRefGoogle Scholar
  10. Lindell Y, Pinkas B (2000) Privacy preserving data mining. In: Advances in Cryptology-CRYPTO ’00, pp 36–54Google Scholar
  11. Nabar SU, Kenthapadi K, Mishra N, Motwani R (2008) A survey of query auditing techniques for data privacy. In: Privacy-Preserving Data Mining: Models and Algorithms, pp 415–431Google Scholar
  12. Raykar VC, Yu S, Zhao LH, Florin C, Bogoni L, Moy L (2010) Learning from crowds. J Mach Learn Res 11:1297–1322MathSciNetGoogle Scholar
  13. Shamir A (1979) How to share a secret. Commun ACM 22(11):612–613. doi: 10.1145/359168.359176 CrossRefMATHMathSciNetGoogle Scholar
  14. Sheng VS, Provost F, Ipeirotis PG (2008) Get another label? Improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 614–622Google Scholar
  15. Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Syst 10(5):557–570. doi: 10.1142/S0218488502001648 CrossRefMATHMathSciNetGoogle Scholar
  16. Varshney LR (2012) Privacy and reliability in crowdsourcing service delivery. In: Proceedings of the 2012 Annual SRII Global Conference, pp 55–60Google Scholar
  17. Welinder P, Branson S, Belongie S, Perona P (2010) The multidimensional wisdom of crowds. Adv Neural Inf Process Syst 23:2424–2432Google Scholar
  18. Whitehill J, Ruvolo P, Wu T, Bergsma J, Movellan J (2009) Whose vote should count more: optimal integration of labels from labelers of unknown expertise. Adv Neural Inf Process Syst 22:2035–2043Google Scholar
  19. Yang B, Sato I, Nakagawa H (2012) Privacy-preserving EM algorithm for clustering on social network. In: Advances in Knowledge Discovery and Data Mining 16th Pacific-Asia Conference, PAKDD 2012, pp 542–553Google Scholar

Copyright information

© The Author(s) 2014

Authors and Affiliations

  1. 1.Department of Mathematical Informatics, Graduate School of Information Science and TechnologyThe University of TokyoTokyo Japan
  2. 2.Advanced Center for Computing and CommunicationRIKENSaitamaJapan
  3. 3.Department of Intelligence Science and Technology, Graduate School of InformaticsKyoto UniversitySakyo-kuJapan

Personalised recommendations