Skip to main content

Crowdsourced Operators

  • Chapter
  • First Online:
Book cover Crowdsourced Data Management
  • 442 Accesses

Abstract

To obtain high-quality results, different applications require the use of different crowdsourced operators, which have operator-specific optimization goals over three factors: cost, quality, and latency. This chapter reviews how crowdsourced operators (i.e., crowdsourced selection, crowdsourced collection, crowdsourced join, crowdsourced sort, crowdsourced top-k, crowdsourced max/min, crowdsourced aggregation, crowdsourced categorization, crowdsourced skyline, crowdsourced planning, crowdsourced schema matching, crowd mining, spatial crowdsourcing) can be implemented and optimized.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Each task is assigned to five workers at most, and majority voting is used to aggregate answers, i.e., the result is returned upon getting three consistent answers.

  2. 2.

    the symbol ‘‘[]’’ means anything , and its value is not cared.

References

  1. Amazon mechanical turk. https://www.mturk.com/

  2. Adelsman, R.M., Whinston, A.B.: Sophisticated voting with information for two voting functions. Journal of Economic Theory 15(1), 145–159 (1977)

    Article  MathSciNet  Google Scholar 

  3. Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: SIGMOD, pp. 207–216 (1993)

    Google Scholar 

  4. Amarilli, A., Amsterdamer, Y., Milo, T.: On the complexity of mining itemsets from the crowd using taxonomies. In: ICDT, pp. 15–25 (2015)

    Google Scholar 

  5. Amsterdamer, Y., Davidson, S., Kukliansky, A., Milo, T., Novgorodov, S., Somech, A.: Managing general and individual knowledge in crowd mining applications. In: CIDR (2015)

    Google Scholar 

  6. Amsterdamer, Y., Davidson, S.B., Milo, T., Novgorodov, S., Somech, A.: Oassis: query driven crowd mining. In: SIGMOD, pp. 589–600. ACM (2014)

    Google Scholar 

  7. Amsterdamer, Y., Davidson, S.B., Milo, T., Novgorodov, S., Somech, A.: Ontology assisted crowd mining. PVLDB 7(13), 1597–1600 (2014)

    Google Scholar 

  8. Amsterdamer, Y., Grossman, Y., Milo, T., Senellart, P.: Crowd mining. In: SIGMOD, pp. 241–252. ACM (2013)

    Google Scholar 

  9. Amsterdamer, Y., Grossman, Y., Milo, T., Senellart, P.: Crowdminer: Mining association rules from the crowd. PVLDB 6(12), 1250–1253 (2013)

    Google Scholar 

  10. Amsterdamer, Y., Kukliansky, A., Milo, T.: Nl2cm: A natural language interface to crowd mining. In: SIGMOD, pp. 1433–1438. ACM (2015)

    Google Scholar 

  11. Artikis, A., Weidlich, M., Schnitzler, F., Boutsis, I., Liebig, T., Piatkowski, N., Bockermann, C., Morik, K., Kalogeraki, V., Marecek, J., et al.: Heterogeneous stream processing and crowdsourcing for urban traffic management. In: EDBT, pp. 712–723 (2014)

    Google Scholar 

  12. Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: ICDE, pp. 421–430 (2001)

    Google Scholar 

  13. Bradley, R.A., Terry, M.E.: Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika pp. 324–345 (1952)

    Google Scholar 

  14. Busa-Fekete, R., Szorenyi, B., Cheng, W., Weng, P., Hullermeier, E.: Top-k selection based on adaptive sampling of noisy preferences. In: ICML, pp. 1094–1102 (2013)

    Google Scholar 

  15. Chai, C., Li, G., Li, J., Deng, D., Feng, J.: Cost-effective crowdsourced entity resolution: A partial-order approach. In: SIGMOD, pp. 969–984 (2016)

    Google Scholar 

  16. Chen, X., Bennett, P.N., Collins-Thompson, K., Horvitz, E.: Pairwise ranking aggregation in a crowdsourced setting. In: WSDM, pp. 193–202 (2013)

    Google Scholar 

  17. Chen, Z., Fu, R., Zhao, Z., Liu, Z., Xia, L., Chen, L., Cheng, P., Cao, C.C., Tong, Y., Zhang, C.J.: gmission: a general spatial crowdsourcing platform. PVLDB 7(13), 1629–1632 (2014)

    Article  Google Scholar 

  18. Chilton, L.B., Little, G., Edge, D., Weld, D.S., Landay, J.A.: Cascade: crowdsourcing taxonomy creation. In: CHI, pp. 1999–2008 (2013). doi: 10.1145/2470654.2466265

  19. Chung, Y., Mortensen, M.L., Binnig, C., Kraska, T.: Estimating the impact of unknown unknowns on aggregate query results. In: SIGMOD, pp. 861–876 (2016). doi: 10.1145/2882903.2882909

  20. Davidson, S.B., Khanna, S., Milo, T., Roy, S.: Using the crowd for top-k and group-by queries. In: ICDT, pp. 225–236 (2013)

    Google Scholar 

  21. Demartini, G., Difallah, D.E., Cudré-Mauroux, P.: Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: WWW, pp. 469–478 (2012)

    Google Scholar 

  22. Deng, D., Li, G., Hao, S., Wang, J., Feng, J.: Massjoin: A mapreduce-based method for scalable string similarity joins. In: ICDE, pp. 340–351 (2014)

    Google Scholar 

  23. Deng, D., Li, G., Wen, H., Feng, J.: An efficient partition based method for exact set similarity joins. PVLDB 9(4), 360–371 (2015)

    Google Scholar 

  24. Deng, D., Shahabi, C., Demiryurek, U.: Maximizing the number of worker’s self-selected tasks in spatial crowdsourcing. In: SIGSPATIAL, pp. 324–333. ACM (2013)

    Google Scholar 

  25. Deng, D., Tao, Y., Li, G.: Overlap set similarity joins with theoretical guarantees. In: SIGMOD (2018)

    Google Scholar 

  26. Elo, A.E.: The rating of chessplayers, past and present, vol. 3. Batsford London (1978)

    Google Scholar 

  27. Eriksson, B.: Learning to top-k search using pairwise comparisons. In: AISTATS, pp. 265–273 (2013)

    Google Scholar 

  28. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences 66(4), 614–656 (2003)

    Article  MathSciNet  Google Scholar 

  29. Fan, J., Lu, M., Ooi, B.C., Tan, W.C., Zhang, M.: A hybrid machine-crowdsourcing system for matching web tables. In: ICDE, pp. 976–987. IEEE (2014)

    Google Scholar 

  30. Fan, J., Wei, Z., Zhang, D., Yang, J., Du, X.: Distribution-aware crowdsourced entity collection. IEEE Trans. Knowl. Data Eng. (2017)

    Google Scholar 

  31. Feige, U., Raghavan, P., Peleg, D., Upfal, E.: Computing with noisy information. SIAM J. Comput. pp. 1001–1018 (1994)

    Article  MathSciNet  Google Scholar 

  32. Feng, J., Wang, J., Li, G.: Trie-join: a trie-based method for efficient string similarity joins. VLDB J. 21(4), 437–461 (2012)

    Article  Google Scholar 

  33. Gokhale, C., Das, S., Doan, A., Naughton, J.F., Rampalli, N., Shavlik, J.W., Zhu, X.: Corleone: hands-off crowdsourcing for entity matching. In: SIGMOD, pp. 601–612 (2014)

    Google Scholar 

  34. Gomes, R., Welinder, P., Krause, A., Perona, P.: Crowdclustering. In: NIPS, pp. 558–566 (2011)

    Google Scholar 

  35. Groz, B., Milo, T.: Skyline queries with noisy comparisons. In: PODS, pp. 185–198 (2015)

    Google Scholar 

  36. Gruenheid, A., Kossmann, D., Ramesh, S., Widmer, F.: Crowdsourcing entity resolution: When is A=B? Technical report, ETH Zürich

    Google Scholar 

  37. Guo, S., Parameswaran, A.G., Garcia-Molina, H.: So who won?: dynamic max discovery with the crowd. In: SIGMOD, pp. 385–396 (2012)

    Google Scholar 

  38. ul Hassan, U., Curry, E.: A multi-armed bandit approach to online spatial task assignment. In: UIC (2014)

    Google Scholar 

  39. Heikinheimo, H., Ukkonen, A.: The crowd-median algorithm. In: HCOMP (2013)

    Google Scholar 

  40. Herbrich, R., Minka, T., Graepel, T.: Trueskill: A bayesian skill rating system. In: NIPS, pp. 569–576 (2006)

    Google Scholar 

  41. Hu, H., Li, G., Bao, Z., Feng, J.: Crowdsourcing-based real-time urban traffic speed estimation: From speed to trend. In: ICDE, pp. 883–894 (2016)

    Google Scholar 

  42. Hu, H., Li, G., Bao, Z., Feng, J., Wu, Y., Gong, Z., Xu, Y.: Top-k spatio-textual similarity join. IEEE Trans. Knowl. Data Eng. 28(2), 551–565 (2016)

    Article  Google Scholar 

  43. Hu, H., Zheng, Y., Bao, Z., Li, G., Feng, J.: Crowdsourced poi labelling: Location-aware result inference and task assignment. In: ICDE, pp. 61–72 (2016)

    Google Scholar 

  44. Jeffery, S.R., Franklin, M.J., Halevy, A.Y.: Pay-as-you-go user feedback for dataspace systems. In: SIGMOD, pp. 847–860 (2008)

    Google Scholar 

  45. Jiang, X., Lim, L.H., Yao, Y., Ye, Y.: Statistical ranking and combinatorial hodge theory. Math. Program. pp. 203–244 (2011)

    Article  MathSciNet  Google Scholar 

  46. Jiang, Y., Li, G., Feng, J., Li, W.: String similarity joins: An experimental evaluation. PVLDB 7(8), 625–636 (2014)

    Google Scholar 

  47. Kaplan, H., Lotosh, I., Milo, T., Novgorodov, S.: Answering planning queries with the crowd. PVLDB 6(9), 697–708 (2013)

    Google Scholar 

  48. Kazemi, L., Shahabi, C.: Geocrowd: enabling query answering with spatial crowdsourcing. In: SIGSPATIAL, pp. 189–198. ACM (2012)

    Google Scholar 

  49. Kazemi, L., Shahabi, C., Chen, L.: Geotrucrowd: trustworthy query answering with spatial crowdsourcing. In: SIGSPATIAL, pp. 304–313 (2013)

    Google Scholar 

  50. Khan, A.R., Garcia-Molina, H.: Hybrid strategies for finding the max with the crowd. Tech. rep. (2014)

    Google Scholar 

  51. Li, G., Deng, D., Feng, J.: A partition-based method for string similarity joins with edit-distance constraints. ACM Trans. Database Syst. 38(2), 9:1–9:33 (2013)

    Article  MathSciNet  Google Scholar 

  52. Li, G., Deng, D., Wang, J., Feng, J.: PASS-JOIN: A partition-based method for similarity joins. PVLDB 5(3), 253–264 (2011)

    Google Scholar 

  53. Li, G., He, J., Deng, D., Li, J.: Efficient similarity join and search on multi-attribute data. In: SIGMOD, pp. 1137–1151 (2015)

    Google Scholar 

  54. Li, K., Li, X.Z.G., Feng, J.: A rating-ranking based framework for crowdsourced top-k computation. In: SIGMOD, pp. 1–16 (2018)

    Google Scholar 

  55. Lofi, C., Maarry, K.E., Balke, W.: Skyline queries in crowd-enabled databases. In: EDBT, pp. 465–476 (2013)

    Google Scholar 

  56. Lofi, C., Maarry, K.E., Balke, W.: Skyline queries over incomplete data - error models for focused crowd-sourcing. In: ER, pp. 298–312 (2013)

    Google Scholar 

  57. Lotosh, I., Milo, T., Novgorodov, S.: Crowdplanr: Planning made easy with crowd. In: ICDE, pp. 1344–1347. IEEE (2013)

    Google Scholar 

  58. Marcus, A., Karger, D.R., Madden, S., Miller, R., Oh, S.: Counting with the crowd. PVLDB 6(2), 109–120 (2012)

    Google Scholar 

  59. Marcus, A., Wu, E., Karger, D.R., Madden, S., Miller, R.C.: Human-powered sorts and joins. PVLDB 5(1), 13–24 (2011)

    Google Scholar 

  60. de Marneffe, M.C., MacCartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: LREC (2006)

    Google Scholar 

  61. Mozafari, B., Sarkar, P., Franklin, M., Jordan, M., Madden, S.: Scaling up crowd-sourcing to very large datasets: a case for active learning. PVLDB 8(2), 125–136 (2014)

    Google Scholar 

  62. Nau, D., Ghallab, M., Traverso, P.: Automated Planning: Theory & Practice. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2004)

    MATH  Google Scholar 

  63. Negahban, S., Oh, S., Shah, D.: Iterative ranking from pair-wise comparisons. In: NIPS, pp. 2483–2491 (2012)

    Google Scholar 

  64. Nguyen, Q.V.H., Nguyen, T.T., Miklós, Z., Aberer, K., Gal, A., Weidlich, M.: Pay-as-you-go reconciliation in schema matching networks. In: ICDE, pp. 220–231. IEEE (2014)

    Google Scholar 

  65. Parameswaran, A.G., Boyd, S., Garcia-Molina, H., Gupta, A., Polyzotis, N., Widom, J.: Optimal crowd-powered rating and filtering algorithms. PVLDB 7(9), 685–696 (2014)

    Google Scholar 

  66. Parameswaran, A.G., Garcia-Molina, H., Park, H., Polyzotis, N., Ramesh, A., Widom, J.: Crowdscreen: algorithms for filtering data with humans. In: SIGMOD, pp. 361–372 (2012)

    Google Scholar 

  67. Parameswaran, A.G., Sarma, A.D., Garcia-Molina, H., Polyzotis, N., Widom, J.: Human-assisted graph search: it’s okay to ask questions. PVLDB 4(5), 267–278 (2011)

    Google Scholar 

  68. Park, H., Widom, J.: Crowdfill: collecting structured data from the crowd. In: SIGMOD, pp. 577–588 (2014)

    Google Scholar 

  69. Pfeiffer, T., Gao, X.A., Chen, Y., Mao, A., Rand, D.G.: Adaptive polling for information aggregation. In: AAAI (2012)

    Google Scholar 

  70. Pomerol, J.C., Barba-Romero, S.: Multicriterion decision in management: principles and practice, vol. 25. Springer (2000)

    Google Scholar 

  71. Pournajaf, L., Xiong, L., Sunderam, V., Goryczka, S.: Spatial task assignment for crowd sensing with cloaked locations. In: MDM, vol. 1, pp. 73–82. IEEE (2014)

    Google Scholar 

  72. Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDBJ 10(4), 334–350 (2001)

    Article  Google Scholar 

  73. Rekatsinas, T., Deshpande, A., Parameswaran, A.G.: Crowdgather: Entity extraction over structured domains. CoRR abs/1502.06823 (2015)

    Google Scholar 

  74. Sarawagi, S., Bhamidipaty, A.: Interactive deduplication using active learning. In: SIGKDD, pp. 269–278 (2002)

    Google Scholar 

  75. Sarma, A.D., Jain, A., Nandi, A., Parameswaran, A., Widom, J.: Jellybean: Crowd-powered image counting algorithms. Technical report, Stanford University

    Google Scholar 

  76. Sarma, A.D., Parameswaran, A.G., Garcia-Molina, H., Halevy, A.Y.: Crowd-powered find algorithms. In: ICDE, pp. 964–975 (2014)

    Google Scholar 

  77. Shang, Z., Liu, Y., Li, G., Feng, J.: K-join: Knowledge-aware similarity join. IEEE Trans. Knowl. Data Eng. 28(12), 3293–3308 (2016)

    Article  Google Scholar 

  78. Su, H., Zheng, K., Huang, J., Jeung, H., Chen, L., Zhou, X.: Crowdplanner: A crowd-based route recommendation system. In: ICDE, pp. 1144–1155. IEEE (2014)

    Google Scholar 

  79. Su, H., Zheng, K., Huang, J., Liu, T., Wang, H., Zhou, X.: A crowd-based route recommendation system-crowdplanner. In: ICDE, pp. 1178–1181 (2014)

    Google Scholar 

  80. Sun, J., Shang, Z., Li, G., Deng, D., Bao, Z.: Dima: A distributed in-memory similarity-based query processing system. PVLDB 10(12), 1925–1928 (2017)

    Google Scholar 

  81. Ta, N., Li, G., Feng, J.: An efficient ride-sharing framework for maximizing shared route. TKDE 32(9), 3001–3015 (2017)

    Google Scholar 

  82. Talamadupula, K., Kambhampati, S., Hu, Y., Nguyen, T.A., Zhuo, H.H.: Herding the crowd: Automated planning for crowdsourced planning. In: HCOMP (2013)

    Google Scholar 

  83. To, H., Ghinita, G., Shahabi, C.: A framework for protecting worker location privacy in spatial crowdsourcing. PVLDB 7(10), 919–930 (2014)

    Google Scholar 

  84. Trushkowsky, B., Kraska, T., Franklin, M.J., Sarkar, P.: Crowdsourced enumeration queries. In: ICDE, pp. 673–684 (2013)

    Google Scholar 

  85. Venetis, P., Garcia-Molina, H., Huang, K., Polyzotis, N.: Max algorithms in crowdsourcing environments. In: WWW, pp. 989–998 (2012)

    Google Scholar 

  86. Verroios, V., Garcia-Molina, H.: Entity resolution with crowd errors. In: ICDE, pp. 219–230 (2015)

    Google Scholar 

  87. Verroios, V., Garcia-Molina, H., Papakonstantinou, Y.: Waldo: An adaptive human interface for crowd entity resolution. In: SIGMOD, pp. 1133–1148. ACM (2017)

    Google Scholar 

  88. Vesdapunt, N., Bellare, K., Dalvi, N.N.: Crowdsourcing algorithms for entity resolution. PVLDB 7(12), 1071–1082 (2014)

    Google Scholar 

  89. Wang, J., Kraska, T., Franklin, M.J., Feng, J.: CrowdER: crowdsourcing entity resolution. PVLDB 5(11), 1483–1494 (2012)

    Google Scholar 

  90. Wang, J., Li, G., Feng, J.: Trie-join: Efficient trie-based string similarity joins with edit-distance constraints. PVLDB 3(1), 1219–1230 (2010)

    Google Scholar 

  91. Wang, J., Li, G., Feng, J.: Fast-join: An efficient method for fuzzy token matching based string similarity join. In: ICDE, pp. 458–469 (2011)

    Google Scholar 

  92. Wang, J., Li, G., Feng, J.: Can we beat the prefix filtering?: an adaptive framework for similarity join and search. In: SIGMOD, pp. 85–96 (2012)

    Google Scholar 

  93. Wang, J., Li, G., Feng, J.: Extending string similarity join to tolerant fuzzy token matching. ACM Trans. Database Syst. 39(1), 7:1–7:45 (2014)

    Article  MathSciNet  Google Scholar 

  94. Wang, J., Li, G., Kraska, T., Franklin, M.J., Feng, J.: Leveraging transitive relations for crowdsourced joins. In: SIGMOD, pp. 229–240 (2013)

    Google Scholar 

  95. Wang, S., Xiao, X., Lee, C.: Crowd-based deduplication: An adaptive approach. In: SIGMOD, pp. 1263–1277 (2015)

    Google Scholar 

  96. Wauthier, F.L., Jordan, M.I., Jojic, N.: Efficient ranking from pairwise comparisons. In: ICML, pp. 109–117 (2013)

    Google Scholar 

  97. Weng, X., Li, G., Hu, H., Feng, J.: Crowdsourced selection on multi-attribute data. In: CIKM, pp. 307–316 (2017)

    Google Scholar 

  98. Whang, S.E., Lofgren, P., Garcia-Molina, H.: Question selection for crowd entity resolution. PVLDB 6(6), 349–360 (2013)

    Google Scholar 

  99. Whang, S.E., McAuley, J., Garcia-Molina, H.: Compare me maybe: Crowd entity resolution interfaces. Technical report, Stanford University

    Google Scholar 

  100. Yan, T., Kumar, V., Ganesan, D.: Crowdsearch: exploiting crowds for accurate real-time image search on mobile phones. In: MobiSys, pp. 77–90 (2010)

    Google Scholar 

  101. Ye, P., EDU, U., Doermann, D.: Combining preference and absolute judgements in a crowd-sourced setting. In: ICML Workshop (2013)

    Google Scholar 

  102. Yi, J., Jin, R., Jain, A.K., Jain, S., Yang, T.: Semi-crowdsourced clustering: Generalizing crowd labeling by robust distance metric learning. In: NIPS, pp. 1781–1789 (2012)

    Google Scholar 

  103. Yu, M., Li, G., Deng, D., Feng, J.: String similarity search and join: a survey. Frontiers of Computer Science 10(3), 399–417 (2016)

    Article  Google Scholar 

  104. Yu, M., Wang, J., Li, G., Zhang, Y., Deng, D., Feng, J.: A unified framework for string similarity search with edit-distance constraint. VLDB J. 26(2), 249–274 (2017)

    Article  Google Scholar 

  105. Zhang, C.J., Chen, L., Jagadish, H.V., Cao, C.C.: Reducing uncertainty of schema matching via crowdsourcing. PVLDB 6(9), 757–768 (2013)

    Google Scholar 

  106. Zhang, C.J., Tong, Y., Chen, L.: Where to: Crowd-aided path selection. PVLDB 7(14), 2005–2016 (2014)

    Google Scholar 

  107. Zhang, X., Li, G., Feng, J.: Crowdsourced top-k algorithms: An experimental evaluation. PVLDB 9(4) (2015)

    Article  MathSciNet  Google Scholar 

  108. Zhuang, Y., Li, G., Zhong, Z., Feng, J.: Hike: A hybrid human-machine method for entity alignment in large-scale knowledge bases. In: CIKM, pp. 1917–1926 (2017)

    Google Scholar 

  109. Zhuo, H.H.: Crowdsourced action-model acquisition for planning. In: AAAI, pp. 3439–3446

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Li, G., Wang, J., Zheng, Y., Fan, J., Franklin, M.J. (2018). Crowdsourced Operators. In: Crowdsourced Data Management. Springer, Singapore. https://doi.org/10.1007/978-981-10-7847-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-7847-7_7

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-7846-0

  • Online ISBN: 978-981-10-7847-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics