Statistical Quality Control for Human-Based Electronic Services

  • Robert Kern
  • Hans Thies
  • Gerhard Satzger
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6470)


Crowdsourcing in form of human-based electronic services (people services) provides a powerful way of outsourcing tasks to a large crowd of remote workers over the Internet. Research has shown that multiple redundant results delivered by different workers can be aggregated in order to achieve a reliable result. However, existing implementations of this approach are rather inefficient as they multiply the effort for task execution and are not able to guarantee a certain quality level. As a starting point towards an integrated approach for quality management of people services we have developed a quality management model that combines elements of statistical quality control (SQC) with group decision theory. The contributions of the workers are tracked and weighted individually in order to minimize the quality management effort while guaranteing a well-defined level of overall result quality. A quantitative analysis of the approach based on an optical character recognition (OCR) scenario confirms the efficiency and reach of the approach.


Optical Character Recognition Statistical Quality Control Acceptance Sampling Amazon Mechanical Turk Weighted Majority Vote 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Blackwell, M.: The effect of short production runs on CSP-1. Technometrics 19(3), 259–263 (1977)CrossRefzbMATHGoogle Scholar
  2. 2.
    le marquis de Condorcet, M., Caritat, A.N.: Essai sur l’application de l’analyse la probabilit des dcisions rendues la pluralit des voix (1785)Google Scholar
  3. 3.
    Dawid, A., Skene, A.: Maximum likelihood estimation of observer Error-Rates using the EM algorithm. Journal of the Royal Statistical Society 28(1), 20–28 (1979)Google Scholar
  4. 4.
    Dodge, H., Torrey, M.: Additional continuous sampling inspection plans. Industrial Quality Control (7), 7–12 (1951)Google Scholar
  5. 5.
    Gosh, D.T.: An optimum continuous sampling plan CSP-2 with k i to minimise the amount of inspection when incoming quality p follows a distribution. The Indian Journal of Statistics 58(1), 105–117 (1996)MathSciNetGoogle Scholar
  6. 6.
    Gruca, T.S., Berg, J.E., Cipriano, M.: Consensus and differences of opinion in electronic prediction markets. Electronic Markets 15(1), 13–22 (2005)CrossRefGoogle Scholar
  7. 7.
    Ipeirotis, P.G., Provost, F., Wang, J.: Quality management on amazon mechanical turk (2010)Google Scholar
  8. 8.
    Juran, J., Godfrey, A.: Juran’s Quality Handbook, 5th edn. McGraw-Hill, New York (2000)Google Scholar
  9. 9.
    Kern, R., Bauer, C., Thies, H., Satzger, G.: Validating results of human-based electronic services leveraging multiple reviewers. In: Proceedings of the 16th Americas Conference on Information Systems (AMCIS), Lima, Peru (2010) (forthcoming)Google Scholar
  10. 10.
    Kern, R., Zirpins, C., Agarwal, S.: Managing quality of Human-Based eServices. In: Feuerlicht, G., Lamersdorf, W. (eds.) ICSOC 2008.December 1st, 2008, Revised Selected Papers, LNCS, vol. 5472, pp. 304–309. Springer, Heidelberg (2009)Google Scholar
  11. 11.
    Latif-Shabgahi, G., Bass, J.M., Bennett, S.: A taxonomy for software voting algorithms used in safety-critical systems. IEEE Transactions on Reliability 53(3), 319 (2004)CrossRefGoogle Scholar
  12. 12.
    Lieberman, G.J., Solomon, H.: Multi-Level continuous sampling plans. The Annals of Mathematical Statistics 26(4), 686–704 (1955)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Little, G., Chilton, L.B., Goldman, M., Miller, R.C.: Turkit: Tools for iterative tasks on mechanical turk. In: Proceedings of the ACM SIGKDD Workshop on Human Computation, pp. 29–30 (2009)Google Scholar
  14. 14.
    Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Information and Comutation 108, 212–261 (1994)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    McShane, L.M., Turnbull, B.W.: Probability limits on outgoing quality for continuous sampling plans. Technometrics 33(4), 393–404 (1991)CrossRefzbMATHGoogle Scholar
  16. 16.
    Montgomery, D.: Introduction to statistical quality control, 6th edn. Wiley & Sons, New York (2008)zbMATHGoogle Scholar
  17. 17.
    Raykar, V.C., Yu, S., Zhao, L.H., Valadez, G.H., Florin, C., Bogoni, L., Moy, L.: Learning from crowds. Journal of Machine Learning Research 11, 1297–1322 (2010)MathSciNetGoogle Scholar
  18. 18.
    Revow, M., Williams, C.K.I., Hinton, G.E.: Using generative models for handwritten digit recognition. IEEE Trans. Pattern Anal. Mach. Intell. 18(6), 592–606 (1996)CrossRefGoogle Scholar
  19. 19.
    Ross, J., Irani, L., Silberman, M., Zaldivar, A., Tomlinson, B.: Who are the crowdworkers?: shifting demographics in mechanical turk. In: Proceedings of the 28th of the International Conference Extended Abstracts on Human Factors in Computing Systems, pp. 2863–2872 (2010)Google Scholar
  20. 20.
    Rowe, G., Wright, G.: The delphi technique as a forecasting tool: issues and analysis. International Journal of Forecasting 15(4), 353–375 (1999)CrossRefGoogle Scholar
  21. 21.
    Snow, R., OConnor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast but is it good? evaluating non-expert annotations for natural language tasks. In: EMNLP 2008: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 254–263. ACL, Stroudsburg (2008)CrossRefGoogle Scholar
  22. 22.
    Sorokin, A., Forsyth, D.: Utility data annotation with amazon mechanical turk. In: CVPRW 2008: Proceedings of the Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–8. IEEE Computer Society, Washington (June 2008)CrossRefGoogle Scholar
  23. 23.
    Surowiecki, J.: The Wisdom of Crowds, 1st edn. Doubleday, New York (2004)Google Scholar
  24. 24.
    Wang, R., Chen, C.: Minimum average fraction inspected for continuous sampling plan CSP-1 under inspection error. Journal of Applied Statistics 24(5), 539–548 (1997)CrossRefGoogle Scholar
  25. 25.
    Whitehill, J., Ruvolo, P., Wu, T., Bergsma, J., Movellan, J.: Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In: Advances in Neural Information Processing Systems, vol. 22, pp. 2035–2043 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Robert Kern
    • 1
  • Hans Thies
    • 1
  • Gerhard Satzger
    • 1
  1. 1.Karlsruhe Service Research InstituteKarlsruhe Institute of Technology (KIT)KarlsruheGermany

Personalised recommendations