• Guoliang Li
  • Jiannan Wang
  • Yudian Zheng
  • Ju Fan
  • Michael J. Franklin


Crowdsourcing has been widely used to enhance many data management and analytics tasks. This chapter introduces the motivation, overview, and research challenges of crowdsourced data management. Section 1.1 provides a motivation of crowdsourcing, and Sect. 1.2 gives a brief overview of crowdsourcing. Finally, Sect. 1.3 introduces research challenges of crowdsourced data management and summarizes existing works.


  1. 1.
    Amazon mechanical turk.
  2. 2.
  3. 3.
  4. 4.
    von Ahn, L., Dabbish, L.: ESP: labeling images with a computer game. In: AAAI, pp. 91–98 (2005)Google Scholar
  5. 5.
    Amsterdamer, Y., Davidson, S., Kukliansky, A., Milo, T., Novgorodov, S., Somech, A.: Managing general and individual knowledge in crowd mining applications. In: CIDR (2015)Google Scholar
  6. 6.
    Amsterdamer, Y., Davidson, S.B., Milo, T., Novgorodov, S., Somech, A.: Oassis: query driven crowd mining. In: SIGMOD, pp. 589–600. ACM (2014)Google Scholar
  7. 7.
    Amsterdamer, Y., Davidson, S.B., Milo, T., Novgorodov, S., Somech, A.: Ontology assisted crowd mining. PVLDB 7(13), 1597–1600 (2014)Google Scholar
  8. 8.
    Amsterdamer, Y., Grossman, Y., Milo, T., Senellart, P.: Crowd mining. In: SIGMOD, pp. 241–252. ACM (2013)Google Scholar
  9. 9.
    Amsterdamer, Y., Grossman, Y., Milo, T., Senellart, P.: Crowdminer: Mining association rules from the crowd. PVLDB 6(12), 1250–1253 (2013)Google Scholar
  10. 10.
    Chen, X., Bennett, P.N., Collins-Thompson, K., Horvitz, E.: Pairwise ranking aggregation in a crowdsourced setting. In: WSDM, pp. 193–202 (2013)Google Scholar
  11. 11.
    Davidson, S.B., Khanna, S., Milo, T., Roy, S.: Using the crowd for top-k and group-by queries. In: ICDT, pp. 225–236 (2013)Google Scholar
  12. 12.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: A Large-Scale Hierarchical Image Database. In: CVPR (2009)Google Scholar
  13. 13.
    Eiben, C.B., Siegel, J.B., Bale, J.B., Cooper, S., Khatib, F., Shen, B.W., Players, F., Stoddard, B.L., Popovic, Z., Baker, D.: Increased diels-alderase activity through backbone remodeling guided by foldit players. Nature biotechnology 30(2), 190–192 (2012)CrossRefGoogle Scholar
  14. 14.
    Eriksson, B.: Learning to top-k search using pairwise comparisons. In: AISTATS, pp. 265–273 (2013)Google Scholar
  15. 15.
    Fan, J., Lu, M., Ooi, B.C., Tan, W.C., Zhang, M.: A hybrid machine-crowdsourcing system for matching web tables. In: ICDE, pp. 976–987. IEEE (2014)Google Scholar
  16. 16.
    Fan, J., Zhang, M., Kok, S., Lu, M., Ooi, B.C.: Crowdop: Query optimization for declarative crowdsourcing systems. IEEE Trans. Knowl. Data Eng. 27(8), 2078–2092 (2015)CrossRefGoogle Scholar
  17. 17.
    Fang, Y., Sun, H., Li, G., Zhang, R., Huai, J.: Effective result inference for context-sensitive tasks in crowdsourcing. In: DASFAA, pp. 33–48 (2016)CrossRefGoogle Scholar
  18. 18.
    Franklin, M.J., Kossmann, D., Kraska, T., Ramesh, S., Xin, R.: Crowddb: answering queries with crowdsourcing. In: SIGMOD, pp. 61–72 (2011)Google Scholar
  19. 19.
    Gokhale, C., Das, S., Doan, A., Naughton, J.F., Rampalli, N., Shavlik, J.W., Zhu, X.: Corleone: hands-off crowdsourcing for entity matching. In: SIGMOD, pp. 601–612 (2014)Google Scholar
  20. 20.
    Groz, B., Milo, T.: Skyline queries with noisy comparisons. In: PODS, pp. 185–198 (2015)Google Scholar
  21. 21.
    Guo, S., Parameswaran, A.G., Garcia-Molina, H.: So who won?: dynamic max discovery with the crowd. In: SIGMOD, pp. 385–396 (2012)Google Scholar
  22. 22.
    Heikinheimo, H., Ukkonen, A.: The crowd-median algorithm. In: HCOMP (2013)Google Scholar
  23. 23.
    Ipeirotis, P., Provost, F., Wang, J.: Quality management on amazon mechanical turk. In: SIGKDD Workshop, pp. 64–67 (2010)Google Scholar
  24. 24.
    Kaplan, H., Lotosh, I., Milo, T., Novgorodov, S.: Answering planning queries with the crowd. PVLDB 6(9), 697–708 (2013)Google Scholar
  25. 25.
    Khan, A.R., Garcia-Molina, H.: Hybrid strategies for finding the max with the crowd. Tech. rep. (2014)Google Scholar
  26. 26.
    Li, G.: Human-in-the-loop data integration. PVLDB 10(12), 2006–2017 (2017)Google Scholar
  27. 27.
    Li, G., Chai, C., Fan, J., Weng, X., Li, J., Zheng, Y., Li, Y., Yu, X., Zhang, X., Yuan, H.: CDB: optimizing queries with crowd-based selections and joins. In: SIGMOD, pp. 1463–1478 (2017)Google Scholar
  28. 28.
    Li, G., Wang, J., Zheng, Y., Franklin, M.J.: Crowdsourced data management: A survey. TKDE 28(9), 2296–2319 (2016)Google Scholar
  29. 29.
    Liu, Q., Peng, J., Ihler, A.T.: Variational inference for crowdsourcing. In: NIPS, pp. 701–709 (2012)Google Scholar
  30. 30.
    Liu, X., Lu, M., Ooi, B.C., Shen, Y., Wu, S., Zhang, M.: CDAS: A crowdsourcing data analytics system. PVLDB 5(10), 1040–1051 (2012)Google Scholar
  31. 31.
    Lofi, C., Maarry, K.E., Balke, W.: Skyline queries in crowd-enabled databases. In: EDBT, pp. 465–476 (2013)Google Scholar
  32. 32.
    Lofi, C., Maarry, K.E., Balke, W.: Skyline queries over incomplete data - error models for focused crowd-sourcing. In: ER, pp. 298–312 (2013)Google Scholar
  33. 33.
    Marcus, A., Karger, D.R., Madden, S., Miller, R., Oh, S.: Counting with the crowd. PVLDB 6(2), 109–120 (2012)Google Scholar
  34. 34.
    Marcus, A., Wu, E., Madden, S., Miller, R.C.: Crowdsourced databases: Query processing with people. In: CIDR, pp. 211–214 (2011)Google Scholar
  35. 35.
    Mozafari, B., Sarkar, P., Franklin, M., Jordan, M., Madden, S.: Scaling up crowd-sourcing to very large datasets: a case for active learning. PVLDB 8(2), 125–136 (2014)Google Scholar
  36. 36.
    Nguyen, Q.V.H., Nguyen, T.T., Miklós, Z., Aberer, K., Gal, A., Weidlich, M.: Pay-as-you-go reconciliation in schema matching networks. In: ICDE, pp. 220–231. IEEE (2014)Google Scholar
  37. 37.
    Parameswaran, A.G., Boyd, S., Garcia-Molina, H., Gupta, A., Polyzotis, N., Widom, J.: Optimal crowd-powered rating and filtering algorithms. PVLDB 7(9), 685–696 (2014)Google Scholar
  38. 38.
    Parameswaran, A.G., Garcia-Molina, H., Park, H., Polyzotis, N., Ramesh, A., Widom, J.: Crowdscreen: algorithms for filtering data with humans. In: SIGMOD, pp. 361–372 (2012)Google Scholar
  39. 39.
    Parameswaran, A.G., Sarma, A.D., Garcia-Molina, H., Polyzotis, N., Widom, J.: Human-assisted graph search: it’s okay to ask questions. PVLDB 4(5), 267–278 (2011)Google Scholar
  40. 40.
    Park, H., Pang, R., Parameswaran, A.G., Garcia-Molina, H., Polyzotis, N., Widom, J.: Deco: A system for declarative crowdsourcing. PVLDB 5(12), 1990–1993 (2012)Google Scholar
  41. 41.
    Park, H., Widom, J.: Crowdfill: collecting structured data from the crowd. In: SIGMOD, pp. 577–588 (2014)Google Scholar
  42. 42.
    Pfeiffer, T., Gao, X.A., Chen, Y., Mao, A., Rand, D.G.: Adaptive polling for information aggregation. In: AAAI (2012)Google Scholar
  43. 43.
    Sarma, A.D., Parameswaran, A.G., Garcia-Molina, H., Halevy, A.Y.: Crowd-powered find algorithms. In: ICDE, pp. 964–975 (2014)Google Scholar
  44. 44.
    Smyth, P., Fayyad, U.M., B truth from subjective labelling of venus images. In: NIPS, pp. 1085–1092 (1994)Google Scholar
  45. 45.
    Su, H., Zheng, K., Huang, J., Jeung, H., Chen, L., Zhou, X.: Crowdplanner: A crowd-based route recommendation system. In: ICDE, pp. 1144–1155. IEEE (2014)Google Scholar
  46. 46.
    Talamadupula, K., Kambhampati, S., Hu, Y., Nguyen, T.A., Zhuo, H.H.: Herding the crowd: Automated planning for crowdsourced planning. In: HCOMP (2013)Google Scholar
  47. 47.
    To, H., Ghinita, G., Shahabi, C.: A framework for protecting worker location privacy in spatial crowdsourcing. PVLDB 7(10), 919–930 (2014)Google Scholar
  48. 48.
    Trushkowsky, B., Kraska, T., Franklin, M.J., Sarkar, P.: Crowdsourced enumeration queries. In: ICDE, pp. 673–684 (2013)Google Scholar
  49. 49.
    Venetis, P., Garcia-Molina, H., Huang, K., Polyzotis, N.: Max algorithms in crowdsourcing environments. In: WWW, pp. 989–998 (2012)Google Scholar
  50. 50.
    Vesdapunt, N., Bellare, K., Dalvi, N.N.: Crowdsourcing algorithms for entity resolution. PVLDB 7(12), 1071–1082 (2014)Google Scholar
  51. 51.
    Von Ahn, L., Maurer, B., McMillen, C., Abraham, D., Blum, M.: recaptcha: Human-based character recognition via web security measures. Science 321(5895), 1465–1468 (2008)MathSciNetCrossRefGoogle Scholar
  52. 52.
    Wang, J., Kraska, T., Franklin, M.J., Feng, J.: CrowdER: crowdsourcing entity resolution. PVLDB 5(11), 1483–1494 (2012)Google Scholar
  53. 53.
    Wang, J., Krishnan, S., Franklin, M.J., Goldberg, K., Kraska, T., Milo, T.: A sample-and-clean framework for fast and accurate query processing on dirty data. In: SIGMOD, pp. 469–480 (2014)Google Scholar
  54. 54.
    Wang, J., Li, G., Kraska, T., Franklin, M.J., Feng, J.: Leveraging transitive relations for crowdsourced joins. In: SIGMOD, pp. 229–240 (2013)Google Scholar
  55. 55.
    Wang, S., Xiao, X., Lee, C.: Crowd-based deduplication: An adaptive approach. In: SIGMOD, pp. 1263–1277 (2015)Google Scholar
  56. 56.
    Welinder, P., Perona, P.: Online crowdsourcing: rating annotators and obtaining cost-effective labels. In: CVPR Workshop (ACVHL), pp. 25–32. IEEE (2010)Google Scholar
  57. 57.
    Whang, S.E., Lofgren, P., Garcia-Molina, H.: Question selection for crowd entity resolution. PVLDB 6(6), 349–360 (2013)Google Scholar
  58. 58.
    Whitehill, J., Ruvolo, P., Wu, T., Bergsma, J., Movellan, J.R.: Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In: NIPS, pp. 2035–2043 (2009)Google Scholar
  59. 59.
    Yan, T., Kumar, V., Ganesan, D.: Crowdsearch: exploiting crowds for accurate real-time image search on mobile phones. In: MobiSys, pp. 77–90 (2010)Google Scholar
  60. 60.
    Ye, P., EDU, U., Doermann, D.: Combining preference and absolute judgements in a crowd-sourced setting. In: ICML Workshop (2013)Google Scholar
  61. 61.
    Zhang, C.J., Chen, L., Jagadish, H.V., Cao, C.C.: Reducing uncertainty of schema matching via crowdsourcing. PVLDB 6(9), 757–768 (2013)Google Scholar
  62. 62.
    Zhang, C.J., Tong, Y., Chen, L.: Where to: Crowd-aided path selection. PVLDB 7(14), 2005–2016 (2014)Google Scholar
  63. 63.
    Zheng, Y., Wang, J., Li, G., Cheng, R., Feng, J.: QASCA: A quality-aware task assignment system for crowdsourcing applications. In: SIGMOD, pp. 1031–1046 (2015)Google Scholar
  64. 64.
    Zhuo, H.H.: Crowdsourced action-model acquisition for planning. In: AAAI, pp. 3439–3446Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  • Guoliang Li
    • 1
  • Jiannan Wang
    • 2
  • Yudian Zheng
    • 3
  • Ju Fan
    • 4
  • Michael J. Franklin
    • 5
  1. 1.Department of Computer Science and TechnologyTsinghua UniversityBeijingChina
  2. 2.School of Computing ScienceSimon Fraser UniversityBurnabyCanada
  3. 3.Twitter Inc.San FranciscoUSA
  4. 4.DEKE Lab & School of InformationRenmin University of ChinaBeijingChina
  5. 5.Department of Computer ScienceUniversity of ChicagoChicagoUSA

Personalised recommendations