Applied Intelligence

, Volume 48, Issue 2, pp 381–389 | Cite as

A maximum-likelihood and moment-matching density estimator for crowd-sourcing label prediction

  • Minyoung Kim


We deal with the parameter estimation problem for probability density models with latent variables. For this problem traditionally the expectation maximization (EM) algorithm has been broadly used. However, it suffers from bad local maxima, and the quality of the estimator is sensitive to the initial model choice. Recently, an alternative density estimator has been proposed that is based on matching the moments between sample averaged and model averaged. This moment matching estimator is typically used as the initial iterate for the EM algorithm for further refinement. However, there is actually no guarantee that the EM-refined estimator still yields the moments close enough to the sample-averaged one. Motivated by this issue, in this paper we propose a novel estimator that takes merits of both worlds: we do likelihood maximization, but the moment discrepancy score is used as a regularizer that prevents the model-averaged moments from straying away from those estimated from data. On some crowd-sourcing label prediction problems, we demonstrate that the proposed approach yields more accurate density estimates than the existing estimators.


Density estimation Expectation maximization Moment matching Crowd-sourcing label prediction problem 



This work is supported by National Research Foundation of Korea (NRF-2016R1A1A1A05921948)

Compliance with Ethical Standards

Conflict of interests

The authors have no conflict of interest.

Consent for Publication

This research does not involve human participants nor animals. Consent to submit this manuscript has been received tacitly from the authors’ institution, Seoul National University of Science & Technology.


  1. 1.
    Anandkumar A, Foster DP, Hsu D, Kakade SM, Liu YK (2015) A spectral algorithm for latent Dirichlet allocation. Algorithmica 72(1):193–214MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Anandkumar A, Ge R, Hsu D, Kakade SM, Telgarsky M (2014) Tensor decompositions for learning latent variable models. J Mach Learn Res 15:2773–2832MathSciNetMATHGoogle Scholar
  3. 3.
    Anandkumar A, Hsu D, Kakade SM (2012) A method of moments for mixture models and hidden Markov models. In: 25th annual conference on learning theoryGoogle Scholar
  4. 4.
    Belkin M, Sinha K (2015) Polynomial learning of distribution families. SIAM J Comput 44(4):889–911MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Bishop C (2007) Pattern recognition and machine learning. Springer, BerlinMATHGoogle Scholar
  6. 6.
    Dalvi N, Dasgupta A, Kumar R, Rastogi V (2013) Aggregating crowdsourced binary ratings. In: Proceedings of world wide web conferenceGoogle Scholar
  7. 7.
    Dawid AP, Skene AM (1979) Maximum likelihood estimation of observer error-rates using the EM algorithm. J R Stat Soc Ser C 20–28Google Scholar
  8. 8.
    Debole F, Sebastiani F (2003) Supervised term weighting for automated text categorization. In: Proceedings of the ACM symposium on Applied computingGoogle Scholar
  9. 9.
    Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc Ser B 39(1):1–38MathSciNetMATHGoogle Scholar
  10. 10.
    Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE international conference on computer vision and pattern recognitionGoogle Scholar
  11. 11.
    Deng ZH, Tang SW, Yang DQ, Li MZLY, Xie KQ (2004) A comparative study on feature weight in text categorization. Advanced Web Technologies and Applications. Lect Notes Comput Sci 3007:588–597CrossRefGoogle Scholar
  12. 12.
    Diamond S, Boyd S (2016) Cvxpy: a python-embedded modeling language for convex optimization. J Mach Learn Res 17(83):1–5MathSciNetMATHGoogle Scholar
  13. 13.
    Ghosh A, Kale S, McAfee P (2011) Who moderates the moderators? Crowdsourcing abuse detection in user-generated content. In Proceedings of the ACM conference on electronic commerceGoogle Scholar
  14. 14.
    Hsu D, Kakade SM (2013) Learning mixtures of spherical Gaussians: moment methods and spectral decompositions. In: Proceedings of the 4th conference on innovations in theoretical computer scienceGoogle Scholar
  15. 15.
    Liu Q, Peng J, Ihler AT (2012) Variational inference for crowdsourcing. In: Advances in neural information processing systemsGoogle Scholar
  16. 16.
    Lofberg J (2004) YALMIP: a toolbox for modeling and optimization in MATLAB. In: Proceedings of the IEEE international symposium on computed aided control systems designGoogle Scholar
  17. 17.
    Moitra A, Valiant G (2010) Settling the polynomial learnability of mixtures of Gaussians. In: 51st annual IEEE symposium on foundations of computer scienceGoogle Scholar
  18. 18.
    Raghunathan A, Frostig R, Duchi J, Liang P (2016) Estimation from indirect supervision with linear moments. In: International conference on machine learningGoogle Scholar
  19. 19.
    Raykar VC, Yu S, Zhao LH, Valadez GH, Florin C, Bogoni L, Moy L (2010) Learning from crowds. J Mach Learn Res 11:1297–1322MathSciNetGoogle Scholar
  20. 20.
    Sarkar P, Siddiqi SM, Gordon GJ (2007) A latent space approach to dynamic embedding of co-occurrence data. In: Proceedings of the 11th international conference on artificial intelligence and statisticsGoogle Scholar
  21. 21.
    Snow R, O’Connor B, Jurafsky D, Ng AY (2008) Cheap and fast? But is it good?: Evaluating non-expert annotations for natural language tasks. In: Proceedings of the conference on empirical methods in natural language processingGoogle Scholar
  22. 22.
    Sorensen DC (1982) Newton’s method with a model trust region modification. SIAM J Numer Anal 19 (2):409–426MathSciNetCrossRefMATHGoogle Scholar
  23. 23.
    Wang Y, Xie B, Song L (2016) Isotonic Hawkes processes. In: International conference on machine learningGoogle Scholar
  24. 24.
    Xiang Yuan Y (2015) Recent advances in trust region algorithms. Math Program 151(1):249–281MathSciNetCrossRefMATHGoogle Scholar
  25. 25.
    Zhang Y, Chen X, Zhou D, Jordan MI (2014) Spectral methods meet EM: a provably optimal algorithm for crowdsourcing. In: Advances in neural information processing systemsGoogle Scholar
  26. 26.
    Zhou D, Liu Q, Platt JC, Meek C (2014) Aggregating ordinal labels from crowds by minimax conditional entropy. In: International conference on machine learningGoogle Scholar
  27. 27.
    Zhou D, Platt JC, Basu S, Mao Y (2012) Learning from the wisdom of crowds by minimax entropy. In: Advances in neural information processing systemsGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.Department of Electronics & IT Media EngineeringSeoul National University of Science & TechnologySeoulKorea

Personalised recommendations