Crowdsourced Operators

  • Guoliang Li
  • Jiannan Wang
  • Yudian Zheng
  • Ju Fan
  • Michael J. Franklin


To obtain high-quality results, different applications require the use of different crowdsourced operators, which have operator-specific optimization goals over three factors: cost, quality, and latency. This chapter reviews how crowdsourced operators (i.e., crowdsourced selection, crowdsourced collection, crowdsourced join, crowdsourced sort, crowdsourced top-k, crowdsourced max/min, crowdsourced aggregation, crowdsourced categorization, crowdsourced skyline, crowdsourced planning, crowdsourced schema matching, crowd mining, spatial crowdsourcing) can be implemented and optimized.


  1. 1.
    Amazon mechanical turk.
  2. 2.
    Adelsman, R.M., Whinston, A.B.: Sophisticated voting with information for two voting functions. Journal of Economic Theory 15(1), 145–159 (1977)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: SIGMOD, pp. 207–216 (1993)Google Scholar
  4. 4.
    Amarilli, A., Amsterdamer, Y., Milo, T.: On the complexity of mining itemsets from the crowd using taxonomies. In: ICDT, pp. 15–25 (2015)Google Scholar
  5. 5.
    Amsterdamer, Y., Davidson, S., Kukliansky, A., Milo, T., Novgorodov, S., Somech, A.: Managing general and individual knowledge in crowd mining applications. In: CIDR (2015)Google Scholar
  6. 6.
    Amsterdamer, Y., Davidson, S.B., Milo, T., Novgorodov, S., Somech, A.: Oassis: query driven crowd mining. In: SIGMOD, pp. 589–600. ACM (2014)Google Scholar
  7. 7.
    Amsterdamer, Y., Davidson, S.B., Milo, T., Novgorodov, S., Somech, A.: Ontology assisted crowd mining. PVLDB 7(13), 1597–1600 (2014)Google Scholar
  8. 8.
    Amsterdamer, Y., Grossman, Y., Milo, T., Senellart, P.: Crowd mining. In: SIGMOD, pp. 241–252. ACM (2013)Google Scholar
  9. 9.
    Amsterdamer, Y., Grossman, Y., Milo, T., Senellart, P.: Crowdminer: Mining association rules from the crowd. PVLDB 6(12), 1250–1253 (2013)Google Scholar
  10. 10.
    Amsterdamer, Y., Kukliansky, A., Milo, T.: Nl2cm: A natural language interface to crowd mining. In: SIGMOD, pp. 1433–1438. ACM (2015)Google Scholar
  11. 11.
    Artikis, A., Weidlich, M., Schnitzler, F., Boutsis, I., Liebig, T., Piatkowski, N., Bockermann, C., Morik, K., Kalogeraki, V., Marecek, J., et al.: Heterogeneous stream processing and crowdsourcing for urban traffic management. In: EDBT, pp. 712–723 (2014)Google Scholar
  12. 12.
    Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: ICDE, pp. 421–430 (2001)Google Scholar
  13. 13.
    Bradley, R.A., Terry, M.E.: Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika pp. 324–345 (1952)Google Scholar
  14. 14.
    Busa-Fekete, R., Szorenyi, B., Cheng, W., Weng, P., Hullermeier, E.: Top-k selection based on adaptive sampling of noisy preferences. In: ICML, pp. 1094–1102 (2013)Google Scholar
  15. 15.
    Chai, C., Li, G., Li, J., Deng, D., Feng, J.: Cost-effective crowdsourced entity resolution: A partial-order approach. In: SIGMOD, pp. 969–984 (2016)Google Scholar
  16. 16.
    Chen, X., Bennett, P.N., Collins-Thompson, K., Horvitz, E.: Pairwise ranking aggregation in a crowdsourced setting. In: WSDM, pp. 193–202 (2013)Google Scholar
  17. 17.
    Chen, Z., Fu, R., Zhao, Z., Liu, Z., Xia, L., Chen, L., Cheng, P., Cao, C.C., Tong, Y., Zhang, C.J.: gmission: a general spatial crowdsourcing platform. PVLDB 7(13), 1629–1632 (2014)CrossRefGoogle Scholar
  18. 18.
    Chilton, L.B., Little, G., Edge, D., Weld, D.S., Landay, J.A.: Cascade: crowdsourcing taxonomy creation. In: CHI, pp. 1999–2008 (2013). doi: 10.1145/2470654.2466265
  19. 19.
    Chung, Y., Mortensen, M.L., Binnig, C., Kraska, T.: Estimating the impact of unknown unknowns on aggregate query results. In: SIGMOD, pp. 861–876 (2016). doi: 10.1145/2882903.2882909
  20. 20.
    Davidson, S.B., Khanna, S., Milo, T., Roy, S.: Using the crowd for top-k and group-by queries. In: ICDT, pp. 225–236 (2013)Google Scholar
  21. 21.
    Demartini, G., Difallah, D.E., Cudré-Mauroux, P.: Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: WWW, pp. 469–478 (2012)Google Scholar
  22. 22.
    Deng, D., Li, G., Hao, S., Wang, J., Feng, J.: Massjoin: A mapreduce-based method for scalable string similarity joins. In: ICDE, pp. 340–351 (2014)Google Scholar
  23. 23.
    Deng, D., Li, G., Wen, H., Feng, J.: An efficient partition based method for exact set similarity joins. PVLDB 9(4), 360–371 (2015)Google Scholar
  24. 24.
    Deng, D., Shahabi, C., Demiryurek, U.: Maximizing the number of worker’s self-selected tasks in spatial crowdsourcing. In: SIGSPATIAL, pp. 324–333. ACM (2013)Google Scholar
  25. 25.
    Deng, D., Tao, Y., Li, G.: Overlap set similarity joins with theoretical guarantees. In: SIGMOD (2018)Google Scholar
  26. 26.
    Elo, A.E.: The rating of chessplayers, past and present, vol. 3. Batsford London (1978)Google Scholar
  27. 27.
    Eriksson, B.: Learning to top-k search using pairwise comparisons. In: AISTATS, pp. 265–273 (2013)Google Scholar
  28. 28.
    Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences 66(4), 614–656 (2003)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Fan, J., Lu, M., Ooi, B.C., Tan, W.C., Zhang, M.: A hybrid machine-crowdsourcing system for matching web tables. In: ICDE, pp. 976–987. IEEE (2014)Google Scholar
  30. 30.
    Fan, J., Wei, Z., Zhang, D., Yang, J., Du, X.: Distribution-aware crowdsourced entity collection. IEEE Trans. Knowl. Data Eng. (2017)Google Scholar
  31. 31.
    Feige, U., Raghavan, P., Peleg, D., Upfal, E.: Computing with noisy information. SIAM J. Comput. pp. 1001–1018 (1994)MathSciNetCrossRefGoogle Scholar
  32. 32.
    Feng, J., Wang, J., Li, G.: Trie-join: a trie-based method for efficient string similarity joins. VLDB J. 21(4), 437–461 (2012)CrossRefGoogle Scholar
  33. 33.
    Gokhale, C., Das, S., Doan, A., Naughton, J.F., Rampalli, N., Shavlik, J.W., Zhu, X.: Corleone: hands-off crowdsourcing for entity matching. In: SIGMOD, pp. 601–612 (2014)Google Scholar
  34. 34.
    Gomes, R., Welinder, P., Krause, A., Perona, P.: Crowdclustering. In: NIPS, pp. 558–566 (2011)Google Scholar
  35. 35.
    Groz, B., Milo, T.: Skyline queries with noisy comparisons. In: PODS, pp. 185–198 (2015)Google Scholar
  36. 36.
    Gruenheid, A., Kossmann, D., Ramesh, S., Widmer, F.: Crowdsourcing entity resolution: When is A=B? Technical report, ETH ZürichGoogle Scholar
  37. 37.
    Guo, S., Parameswaran, A.G., Garcia-Molina, H.: So who won?: dynamic max discovery with the crowd. In: SIGMOD, pp. 385–396 (2012)Google Scholar
  38. 38.
    ul Hassan, U., Curry, E.: A multi-armed bandit approach to online spatial task assignment. In: UIC (2014)Google Scholar
  39. 39.
    Heikinheimo, H., Ukkonen, A.: The crowd-median algorithm. In: HCOMP (2013)Google Scholar
  40. 40.
    Herbrich, R., Minka, T., Graepel, T.: Trueskill: A bayesian skill rating system. In: NIPS, pp. 569–576 (2006)Google Scholar
  41. 41.
    Hu, H., Li, G., Bao, Z., Feng, J.: Crowdsourcing-based real-time urban traffic speed estimation: From speed to trend. In: ICDE, pp. 883–894 (2016)Google Scholar
  42. 42.
    Hu, H., Li, G., Bao, Z., Feng, J., Wu, Y., Gong, Z., Xu, Y.: Top-k spatio-textual similarity join. IEEE Trans. Knowl. Data Eng. 28(2), 551–565 (2016)CrossRefGoogle Scholar
  43. 43.
    Hu, H., Zheng, Y., Bao, Z., Li, G., Feng, J.: Crowdsourced poi labelling: Location-aware result inference and task assignment. In: ICDE, pp. 61–72 (2016)Google Scholar
  44. 44.
    Jeffery, S.R., Franklin, M.J., Halevy, A.Y.: Pay-as-you-go user feedback for dataspace systems. In: SIGMOD, pp. 847–860 (2008)Google Scholar
  45. 45.
    Jiang, X., Lim, L.H., Yao, Y., Ye, Y.: Statistical ranking and combinatorial hodge theory. Math. Program. pp. 203–244 (2011)MathSciNetCrossRefGoogle Scholar
  46. 46.
    Jiang, Y., Li, G., Feng, J., Li, W.: String similarity joins: An experimental evaluation. PVLDB 7(8), 625–636 (2014)Google Scholar
  47. 47.
    Kaplan, H., Lotosh, I., Milo, T., Novgorodov, S.: Answering planning queries with the crowd. PVLDB 6(9), 697–708 (2013)Google Scholar
  48. 48.
    Kazemi, L., Shahabi, C.: Geocrowd: enabling query answering with spatial crowdsourcing. In: SIGSPATIAL, pp. 189–198. ACM (2012)Google Scholar
  49. 49.
    Kazemi, L., Shahabi, C., Chen, L.: Geotrucrowd: trustworthy query answering with spatial crowdsourcing. In: SIGSPATIAL, pp. 304–313 (2013)Google Scholar
  50. 50.
    Khan, A.R., Garcia-Molina, H.: Hybrid strategies for finding the max with the crowd. Tech. rep. (2014)Google Scholar
  51. 51.
    Li, G., Deng, D., Feng, J.: A partition-based method for string similarity joins with edit-distance constraints. ACM Trans. Database Syst. 38(2), 9:1–9:33 (2013)MathSciNetCrossRefGoogle Scholar
  52. 52.
    Li, G., Deng, D., Wang, J., Feng, J.: PASS-JOIN: A partition-based method for similarity joins. PVLDB 5(3), 253–264 (2011)Google Scholar
  53. 53.
    Li, G., He, J., Deng, D., Li, J.: Efficient similarity join and search on multi-attribute data. In: SIGMOD, pp. 1137–1151 (2015)Google Scholar
  54. 54.
    Li, K., Li, X.Z.G., Feng, J.: A rating-ranking based framework for crowdsourced top-k computation. In: SIGMOD, pp. 1–16 (2018)Google Scholar
  55. 55.
    Lofi, C., Maarry, K.E., Balke, W.: Skyline queries in crowd-enabled databases. In: EDBT, pp. 465–476 (2013)Google Scholar
  56. 56.
    Lofi, C., Maarry, K.E., Balke, W.: Skyline queries over incomplete data - error models for focused crowd-sourcing. In: ER, pp. 298–312 (2013)Google Scholar
  57. 57.
    Lotosh, I., Milo, T., Novgorodov, S.: Crowdplanr: Planning made easy with crowd. In: ICDE, pp. 1344–1347. IEEE (2013)Google Scholar
  58. 58.
    Marcus, A., Karger, D.R., Madden, S., Miller, R., Oh, S.: Counting with the crowd. PVLDB 6(2), 109–120 (2012)Google Scholar
  59. 59.
    Marcus, A., Wu, E., Karger, D.R., Madden, S., Miller, R.C.: Human-powered sorts and joins. PVLDB 5(1), 13–24 (2011)Google Scholar
  60. 60.
    de Marneffe, M.C., MacCartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: LREC (2006)Google Scholar
  61. 61.
    Mozafari, B., Sarkar, P., Franklin, M., Jordan, M., Madden, S.: Scaling up crowd-sourcing to very large datasets: a case for active learning. PVLDB 8(2), 125–136 (2014)Google Scholar
  62. 62.
    Nau, D., Ghallab, M., Traverso, P.: Automated Planning: Theory & Practice. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2004)zbMATHGoogle Scholar
  63. 63.
    Negahban, S., Oh, S., Shah, D.: Iterative ranking from pair-wise comparisons. In: NIPS, pp. 2483–2491 (2012)Google Scholar
  64. 64.
    Nguyen, Q.V.H., Nguyen, T.T., Miklós, Z., Aberer, K., Gal, A., Weidlich, M.: Pay-as-you-go reconciliation in schema matching networks. In: ICDE, pp. 220–231. IEEE (2014)Google Scholar
  65. 65.
    Parameswaran, A.G., Boyd, S., Garcia-Molina, H., Gupta, A., Polyzotis, N., Widom, J.: Optimal crowd-powered rating and filtering algorithms. PVLDB 7(9), 685–696 (2014)Google Scholar
  66. 66.
    Parameswaran, A.G., Garcia-Molina, H., Park, H., Polyzotis, N., Ramesh, A., Widom, J.: Crowdscreen: algorithms for filtering data with humans. In: SIGMOD, pp. 361–372 (2012)Google Scholar
  67. 67.
    Parameswaran, A.G., Sarma, A.D., Garcia-Molina, H., Polyzotis, N., Widom, J.: Human-assisted graph search: it’s okay to ask questions. PVLDB 4(5), 267–278 (2011)Google Scholar
  68. 68.
    Park, H., Widom, J.: Crowdfill: collecting structured data from the crowd. In: SIGMOD, pp. 577–588 (2014)Google Scholar
  69. 69.
    Pfeiffer, T., Gao, X.A., Chen, Y., Mao, A., Rand, D.G.: Adaptive polling for information aggregation. In: AAAI (2012)Google Scholar
  70. 70.
    Pomerol, J.C., Barba-Romero, S.: Multicriterion decision in management: principles and practice, vol. 25. Springer (2000)Google Scholar
  71. 71.
    Pournajaf, L., Xiong, L., Sunderam, V., Goryczka, S.: Spatial task assignment for crowd sensing with cloaked locations. In: MDM, vol. 1, pp. 73–82. IEEE (2014)Google Scholar
  72. 72.
    Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDBJ 10(4), 334–350 (2001)CrossRefGoogle Scholar
  73. 73.
    Rekatsinas, T., Deshpande, A., Parameswaran, A.G.: Crowdgather: Entity extraction over structured domains. CoRR abs/1502.06823 (2015)Google Scholar
  74. 74.
    Sarawagi, S., Bhamidipaty, A.: Interactive deduplication using active learning. In: SIGKDD, pp. 269–278 (2002)Google Scholar
  75. 75.
    Sarma, A.D., Jain, A., Nandi, A., Parameswaran, A., Widom, J.: Jellybean: Crowd-powered image counting algorithms. Technical report, Stanford UniversityGoogle Scholar
  76. 76.
    Sarma, A.D., Parameswaran, A.G., Garcia-Molina, H., Halevy, A.Y.: Crowd-powered find algorithms. In: ICDE, pp. 964–975 (2014)Google Scholar
  77. 77.
    Shang, Z., Liu, Y., Li, G., Feng, J.: K-join: Knowledge-aware similarity join. IEEE Trans. Knowl. Data Eng. 28(12), 3293–3308 (2016)CrossRefGoogle Scholar
  78. 78.
    Su, H., Zheng, K., Huang, J., Jeung, H., Chen, L., Zhou, X.: Crowdplanner: A crowd-based route recommendation system. In: ICDE, pp. 1144–1155. IEEE (2014)Google Scholar
  79. 79.
    Su, H., Zheng, K., Huang, J., Liu, T., Wang, H., Zhou, X.: A crowd-based route recommendation system-crowdplanner. In: ICDE, pp. 1178–1181 (2014)Google Scholar
  80. 80.
    Sun, J., Shang, Z., Li, G., Deng, D., Bao, Z.: Dima: A distributed in-memory similarity-based query processing system. PVLDB 10(12), 1925–1928 (2017)Google Scholar
  81. 81.
    Ta, N., Li, G., Feng, J.: An efficient ride-sharing framework for maximizing shared route. TKDE 32(9), 3001–3015 (2017)Google Scholar
  82. 82.
    Talamadupula, K., Kambhampati, S., Hu, Y., Nguyen, T.A., Zhuo, H.H.: Herding the crowd: Automated planning for crowdsourced planning. In: HCOMP (2013)Google Scholar
  83. 83.
    To, H., Ghinita, G., Shahabi, C.: A framework for protecting worker location privacy in spatial crowdsourcing. PVLDB 7(10), 919–930 (2014)Google Scholar
  84. 84.
    Trushkowsky, B., Kraska, T., Franklin, M.J., Sarkar, P.: Crowdsourced enumeration queries. In: ICDE, pp. 673–684 (2013)Google Scholar
  85. 85.
    Venetis, P., Garcia-Molina, H., Huang, K., Polyzotis, N.: Max algorithms in crowdsourcing environments. In: WWW, pp. 989–998 (2012)Google Scholar
  86. 86.
    Verroios, V., Garcia-Molina, H.: Entity resolution with crowd errors. In: ICDE, pp. 219–230 (2015)Google Scholar
  87. 87.
    Verroios, V., Garcia-Molina, H., Papakonstantinou, Y.: Waldo: An adaptive human interface for crowd entity resolution. In: SIGMOD, pp. 1133–1148. ACM (2017)Google Scholar
  88. 88.
    Vesdapunt, N., Bellare, K., Dalvi, N.N.: Crowdsourcing algorithms for entity resolution. PVLDB 7(12), 1071–1082 (2014)Google Scholar
  89. 89.
    Wang, J., Kraska, T., Franklin, M.J., Feng, J.: CrowdER: crowdsourcing entity resolution. PVLDB 5(11), 1483–1494 (2012)Google Scholar
  90. 90.
    Wang, J., Li, G., Feng, J.: Trie-join: Efficient trie-based string similarity joins with edit-distance constraints. PVLDB 3(1), 1219–1230 (2010)Google Scholar
  91. 91.
    Wang, J., Li, G., Feng, J.: Fast-join: An efficient method for fuzzy token matching based string similarity join. In: ICDE, pp. 458–469 (2011)Google Scholar
  92. 92.
    Wang, J., Li, G., Feng, J.: Can we beat the prefix filtering?: an adaptive framework for similarity join and search. In: SIGMOD, pp. 85–96 (2012)Google Scholar
  93. 93.
    Wang, J., Li, G., Feng, J.: Extending string similarity join to tolerant fuzzy token matching. ACM Trans. Database Syst. 39(1), 7:1–7:45 (2014)MathSciNetCrossRefGoogle Scholar
  94. 94.
    Wang, J., Li, G., Kraska, T., Franklin, M.J., Feng, J.: Leveraging transitive relations for crowdsourced joins. In: SIGMOD, pp. 229–240 (2013)Google Scholar
  95. 95.
    Wang, S., Xiao, X., Lee, C.: Crowd-based deduplication: An adaptive approach. In: SIGMOD, pp. 1263–1277 (2015)Google Scholar
  96. 96.
    Wauthier, F.L., Jordan, M.I., Jojic, N.: Efficient ranking from pairwise comparisons. In: ICML, pp. 109–117 (2013)Google Scholar
  97. 97.
    Weng, X., Li, G., Hu, H., Feng, J.: Crowdsourced selection on multi-attribute data. In: CIKM, pp. 307–316 (2017)Google Scholar
  98. 98.
    Whang, S.E., Lofgren, P., Garcia-Molina, H.: Question selection for crowd entity resolution. PVLDB 6(6), 349–360 (2013)Google Scholar
  99. 99.
    Whang, S.E., McAuley, J., Garcia-Molina, H.: Compare me maybe: Crowd entity resolution interfaces. Technical report, Stanford UniversityGoogle Scholar
  100. 100.
    Yan, T., Kumar, V., Ganesan, D.: Crowdsearch: exploiting crowds for accurate real-time image search on mobile phones. In: MobiSys, pp. 77–90 (2010)Google Scholar
  101. 101.
    Ye, P., EDU, U., Doermann, D.: Combining preference and absolute judgements in a crowd-sourced setting. In: ICML Workshop (2013)Google Scholar
  102. 102.
    Yi, J., Jin, R., Jain, A.K., Jain, S., Yang, T.: Semi-crowdsourced clustering: Generalizing crowd labeling by robust distance metric learning. In: NIPS, pp. 1781–1789 (2012)Google Scholar
  103. 103.
    Yu, M., Li, G., Deng, D., Feng, J.: String similarity search and join: a survey. Frontiers of Computer Science 10(3), 399–417 (2016)CrossRefGoogle Scholar
  104. 104.
    Yu, M., Wang, J., Li, G., Zhang, Y., Deng, D., Feng, J.: A unified framework for string similarity search with edit-distance constraint. VLDB J. 26(2), 249–274 (2017)CrossRefGoogle Scholar
  105. 105.
    Zhang, C.J., Chen, L., Jagadish, H.V., Cao, C.C.: Reducing uncertainty of schema matching via crowdsourcing. PVLDB 6(9), 757–768 (2013)Google Scholar
  106. 106.
    Zhang, C.J., Tong, Y., Chen, L.: Where to: Crowd-aided path selection. PVLDB 7(14), 2005–2016 (2014)Google Scholar
  107. 107.
    Zhang, X., Li, G., Feng, J.: Crowdsourced top-k algorithms: An experimental evaluation. PVLDB 9(4) (2015)MathSciNetCrossRefGoogle Scholar
  108. 108.
    Zhuang, Y., Li, G., Zhong, Z., Feng, J.: Hike: A hybrid human-machine method for entity alignment in large-scale knowledge bases. In: CIKM, pp. 1917–1926 (2017)Google Scholar
  109. 109.
    Zhuo, H.H.: Crowdsourced action-model acquisition for planning. In: AAAI, pp. 3439–3446Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  • Guoliang Li
    • 1
  • Jiannan Wang
    • 2
  • Yudian Zheng
    • 3
  • Ju Fan
    • 4
  • Michael J. Franklin
    • 5
  1. 1.Department of Computer Science and TechnologyTsinghua UniversityBeijingChina
  2. 2.School of Computing ScienceSimon Fraser UniversityBurnabyCanada
  3. 3.Twitter Inc.San FranciscoUSA
  4. 4.DEKE Lab & School of InformationRenmin University of ChinaBeijingChina
  5. 5.Department of Computer ScienceUniversity of ChicagoChicagoUSA

Personalised recommendations