# A maximum-likelihood and moment-matching density estimator for crowd-sourcing label prediction

## Abstract

We deal with the parameter estimation problem for probability density models with latent variables. For this problem traditionally the expectation maximization (EM) algorithm has been broadly used. However, it suffers from bad local maxima, and the quality of the estimator is sensitive to the initial model choice. Recently, an alternative density estimator has been proposed that is based on matching the moments between sample averaged and model averaged. This moment matching estimator is typically used as the initial iterate for the EM algorithm for further refinement. However, there is actually no guarantee that the EM-refined estimator still yields the moments close enough to the sample-averaged one. Motivated by this issue, in this paper we propose a novel estimator that takes merits of both worlds: we do likelihood maximization, but the moment discrepancy score is used as a regularizer that prevents the model-averaged moments from straying away from those estimated from data. On some crowd-sourcing label prediction problems, we demonstrate that the proposed approach yields more accurate density estimates than the existing estimators.

## Keywords

Density estimation Expectation maximization Moment matching Crowd-sourcing label prediction problem## Notes

### Acknowledgements

This work is supported by National Research Foundation of Korea (NRF-2016R1A1A1A05921948)

### Compliance with Ethical Standards

### Conflict of interests

The authors have no conflict of interest.

### Consent for Publication

This research does not involve human participants nor animals. Consent to submit this manuscript has been received tacitly from the authors’ institution, Seoul National University of Science & Technology.

## References

- 1.Anandkumar A, Foster DP, Hsu D, Kakade SM, Liu YK (2015) A spectral algorithm for latent Dirichlet allocation. Algorithmica 72(1):193–214MathSciNetCrossRefMATHGoogle Scholar
- 2.Anandkumar A, Ge R, Hsu D, Kakade SM, Telgarsky M (2014) Tensor decompositions for learning latent variable models. J Mach Learn Res 15:2773–2832MathSciNetMATHGoogle Scholar
- 3.Anandkumar A, Hsu D, Kakade SM (2012) A method of moments for mixture models and hidden Markov models. In: 25th annual conference on learning theoryGoogle Scholar
- 4.Belkin M, Sinha K (2015) Polynomial learning of distribution families. SIAM J Comput 44(4):889–911MathSciNetCrossRefMATHGoogle Scholar
- 5.Bishop C (2007) Pattern recognition and machine learning. Springer, BerlinMATHGoogle Scholar
- 6.Dalvi N, Dasgupta A, Kumar R, Rastogi V (2013) Aggregating crowdsourced binary ratings. In: Proceedings of world wide web conferenceGoogle Scholar
- 7.Dawid AP, Skene AM (1979) Maximum likelihood estimation of observer error-rates using the EM algorithm. J R Stat Soc Ser C 20–28Google Scholar
- 8.Debole F, Sebastiani F (2003) Supervised term weighting for automated text categorization. In: Proceedings of the ACM symposium on Applied computingGoogle Scholar
- 9.Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc Ser B 39(1):1–38MathSciNetMATHGoogle Scholar
- 10.Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE international conference on computer vision and pattern recognitionGoogle Scholar
- 11.Deng ZH, Tang SW, Yang DQ, Li MZLY, Xie KQ (2004) A comparative study on feature weight in text categorization. Advanced Web Technologies and Applications. Lect Notes Comput Sci 3007:588–597CrossRefGoogle Scholar
- 12.Diamond S, Boyd S (2016) Cvxpy: a python-embedded modeling language for convex optimization. J Mach Learn Res 17(83):1–5MathSciNetMATHGoogle Scholar
- 13.Ghosh A, Kale S, McAfee P (2011) Who moderates the moderators? Crowdsourcing abuse detection in user-generated content. In Proceedings of the ACM conference on electronic commerceGoogle Scholar
- 14.Hsu D, Kakade SM (2013) Learning mixtures of spherical Gaussians: moment methods and spectral decompositions. In: Proceedings of the 4th conference on innovations in theoretical computer scienceGoogle Scholar
- 15.Liu Q, Peng J, Ihler AT (2012) Variational inference for crowdsourcing. In: Advances in neural information processing systemsGoogle Scholar
- 16.Lofberg J (2004) YALMIP: a toolbox for modeling and optimization in MATLAB. In: Proceedings of the IEEE international symposium on computed aided control systems designGoogle Scholar
- 17.Moitra A, Valiant G (2010) Settling the polynomial learnability of mixtures of Gaussians. In: 51st annual IEEE symposium on foundations of computer scienceGoogle Scholar
- 18.Raghunathan A, Frostig R, Duchi J, Liang P (2016) Estimation from indirect supervision with linear moments. In: International conference on machine learningGoogle Scholar
- 19.Raykar VC, Yu S, Zhao LH, Valadez GH, Florin C, Bogoni L, Moy L (2010) Learning from crowds. J Mach Learn Res 11:1297–1322MathSciNetGoogle Scholar
- 20.Sarkar P, Siddiqi SM, Gordon GJ (2007) A latent space approach to dynamic embedding of co-occurrence data. In: Proceedings of the 11th international conference on artificial intelligence and statisticsGoogle Scholar
- 21.Snow R, O’Connor B, Jurafsky D, Ng AY (2008) Cheap and fast? But is it good?: Evaluating non-expert annotations for natural language tasks. In: Proceedings of the conference on empirical methods in natural language processingGoogle Scholar
- 22.Sorensen DC (1982) Newton’s method with a model trust region modification. SIAM J Numer Anal 19 (2):409–426MathSciNetCrossRefMATHGoogle Scholar
- 23.Wang Y, Xie B, Song L (2016) Isotonic Hawkes processes. In: International conference on machine learningGoogle Scholar
- 24.Xiang Yuan Y (2015) Recent advances in trust region algorithms. Math Program 151(1):249–281MathSciNetCrossRefMATHGoogle Scholar
- 25.Zhang Y, Chen X, Zhou D, Jordan MI (2014) Spectral methods meet EM: a provably optimal algorithm for crowdsourcing. In: Advances in neural information processing systemsGoogle Scholar
- 26.Zhou D, Liu Q, Platt JC, Meek C (2014) Aggregating ordinal labels from crowds by minimax conditional entropy. In: International conference on machine learningGoogle Scholar
- 27.Zhou D, Platt JC, Basu S, Mao Y (2012) Learning from the wisdom of crowds by minimax entropy. In: Advances in neural information processing systemsGoogle Scholar