Advertisement

Crowdsourced Operators

  • Guoliang Li
  • Jiannan Wang
  • Yudian Zheng
  • Ju Fan
  • Michael J. Franklin
Chapter

Abstract

To obtain high-quality results, different applications require the use of different crowdsourced operators, which have operator-specific optimization goals over three factors: cost, quality, and latency. This chapter reviews how crowdsourced operators (i.e., crowdsourced selection, crowdsourced collection, crowdsourced join, crowdsourced sort, crowdsourced top-k, crowdsourced max/min, crowdsourced aggregation, crowdsourced categorization, crowdsourced skyline, crowdsourced planning, crowdsourced schema matching, crowd mining, spatial crowdsourcing) can be implemented and optimized.

References

  1. 1.
    Amazon mechanical turk. https://www.mturk.com/
  2. 2.
    Adelsman, R.M., Whinston, A.B.: Sophisticated voting with information for two voting functions. Journal of Economic Theory 15(1), 145–159 (1977)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: SIGMOD, pp. 207–216 (1993)Google Scholar
  4. 4.
    Amarilli, A., Amsterdamer, Y., Milo, T.: On the complexity of mining itemsets from the crowd using taxonomies. In: ICDT, pp. 15–25 (2015)Google Scholar
  5. 5.
    Amsterdamer, Y., Davidson, S., Kukliansky, A., Milo, T., Novgorodov, S., Somech, A.: Managing general and individual knowledge in crowd mining applications. In: CIDR (2015)Google Scholar
  6. 6.
    Amsterdamer, Y., Davidson, S.B., Milo, T., Novgorodov, S., Somech, A.: Oassis: query driven crowd mining. In: SIGMOD, pp. 589–600. ACM (2014)Google Scholar
  7. 7.
    Amsterdamer, Y., Davidson, S.B., Milo, T., Novgorodov, S., Somech, A.: Ontology assisted crowd mining. PVLDB 7(13), 1597–1600 (2014)Google Scholar
  8. 8.
    Amsterdamer, Y., Grossman, Y., Milo, T., Senellart, P.: Crowd mining. In: SIGMOD, pp. 241–252. ACM (2013)Google Scholar
  9. 9.
    Amsterdamer, Y., Grossman, Y., Milo, T., Senellart, P.: Crowdminer: Mining association rules from the crowd. PVLDB 6(12), 1250–1253 (2013)Google Scholar
  10. 10.
    Amsterdamer, Y., Kukliansky, A., Milo, T.: Nl2cm: A natural language interface to crowd mining. In: SIGMOD, pp. 1433–1438. ACM (2015)Google Scholar
  11. 11.
    Artikis, A., Weidlich, M., Schnitzler, F., Boutsis, I., Liebig, T., Piatkowski, N., Bockermann, C., Morik, K., Kalogeraki, V., Marecek, J., et al.: Heterogeneous stream processing and crowdsourcing for urban traffic management. In: EDBT, pp. 712–723 (2014)Google Scholar
  12. 12.
    Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: ICDE, pp. 421–430 (2001)Google Scholar
  13. 13.
    Bradley, R.A., Terry, M.E.: Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika pp. 324–345 (1952)Google Scholar
  14. 14.
    Busa-Fekete, R., Szorenyi, B., Cheng, W., Weng, P., Hullermeier, E.: Top-k selection based on adaptive sampling of noisy preferences. In: ICML, pp. 1094–1102 (2013)Google Scholar
  15. 15.
    Chai, C., Li, G., Li, J., Deng, D., Feng, J.: Cost-effective crowdsourced entity resolution: A partial-order approach. In: SIGMOD, pp. 969–984 (2016)Google Scholar
  16. 16.
    Chen, X., Bennett, P.N., Collins-Thompson, K., Horvitz, E.: Pairwise ranking aggregation in a crowdsourced setting. In: WSDM, pp. 193–202 (2013)Google Scholar
  17. 17.
    Chen, Z., Fu, R., Zhao, Z., Liu, Z., Xia, L., Chen, L., Cheng, P., Cao, C.C., Tong, Y., Zhang, C.J.: gmission: a general spatial crowdsourcing platform. PVLDB 7(13), 1629–1632 (2014)CrossRefGoogle Scholar
  18. 18.
    Chilton, L.B., Little, G., Edge, D., Weld, D.S., Landay, J.A.: Cascade: crowdsourcing taxonomy creation. In: CHI, pp. 1999–2008 (2013). doi: 10.1145/2470654.2466265
  19. 19.
    Chung, Y., Mortensen, M.L., Binnig, C., Kraska, T.: Estimating the impact of unknown unknowns on aggregate query results. In: SIGMOD, pp. 861–876 (2016). doi: 10.1145/2882903.2882909
  20. 20.
    Davidson, S.B., Khanna, S., Milo, T., Roy, S.: Using the crowd for top-k and group-by queries. In: ICDT, pp. 225–236 (2013)Google Scholar
  21. 21.
    Demartini, G., Difallah, D.E., Cudré-Mauroux, P.: Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: WWW, pp. 469–478 (2012)Google Scholar
  22. 22.
    Deng, D., Li, G., Hao, S., Wang, J., Feng, J.: Massjoin: A mapreduce-based method for scalable string similarity joins. In: ICDE, pp. 340–351 (2014)Google Scholar
  23. 23.
    Deng, D., Li, G., Wen, H., Feng, J.: An efficient partition based method for exact set similarity joins. PVLDB 9(4), 360–371 (2015)Google Scholar
  24. 24.
    Deng, D., Shahabi, C., Demiryurek, U.: Maximizing the number of worker’s self-selected tasks in spatial crowdsourcing. In: SIGSPATIAL, pp. 324–333. ACM (2013)Google Scholar
  25. 25.
    Deng, D., Tao, Y., Li, G.: Overlap set similarity joins with theoretical guarantees. In: SIGMOD (2018)Google Scholar
  26. 26.
    Elo, A.E.: The rating of chessplayers, past and present, vol. 3. Batsford London (1978)Google Scholar
  27. 27.
    Eriksson, B.: Learning to top-k search using pairwise comparisons. In: AISTATS, pp. 265–273 (2013)Google Scholar
  28. 28.
    Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences 66(4), 614–656 (2003)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Fan, J., Lu, M., Ooi, B.C., Tan, W.C., Zhang, M.: A hybrid machine-crowdsourcing system for matching web tables. In: ICDE, pp. 976–987. IEEE (2014)Google Scholar
  30. 30.
    Fan, J., Wei, Z., Zhang, D., Yang, J., Du, X.: Distribution-aware crowdsourced entity collection. IEEE Trans. Knowl. Data Eng. (2017)Google Scholar
  31. 31.
    Feige, U., Raghavan, P., Peleg, D., Upfal, E.: Computing with noisy information. SIAM J. Comput. pp. 1001–1018 (1994)MathSciNetCrossRefGoogle Scholar
  32. 32.
    Feng, J., Wang, J., Li, G.: Trie-join: a trie-based method for efficient string similarity joins. VLDB J. 21(4), 437–461 (2012)CrossRefGoogle Scholar
  33. 33.
    Gokhale, C., Das, S., Doan, A., Naughton, J.F., Rampalli, N., Shavlik, J.W., Zhu, X.: Corleone: hands-off crowdsourcing for entity matching. In: SIGMOD, pp. 601–612 (2014)Google Scholar
  34. 34.
    Gomes, R., Welinder, P., Krause, A., Perona, P.: Crowdclustering. In: NIPS, pp. 558–566 (2011)Google Scholar
  35. 35.
    Groz, B., Milo, T.: Skyline queries with noisy comparisons. In: PODS, pp. 185–198 (2015)Google Scholar
  36. 36.
    Gruenheid, A., Kossmann, D., Ramesh, S., Widmer, F.: Crowdsourcing entity resolution: When is A=B? Technical report, ETH ZürichGoogle Scholar
  37. 37.
    Guo, S., Parameswaran, A.G., Garcia-Molina, H.: So who won?: dynamic max discovery with the crowd. In: SIGMOD, pp. 385–396 (2012)Google Scholar
  38. 38.
    ul Hassan, U., Curry, E.: A multi-armed bandit approach to online spatial task assignment. In: UIC (2014)Google Scholar
  39. 39.
    Heikinheimo, H., Ukkonen, A.: The crowd-median algorithm. In: HCOMP (2013)Google Scholar
  40. 40.
    Herbrich, R., Minka, T., Graepel, T.: Trueskill: A bayesian skill rating system. In: NIPS, pp. 569–576 (2006)Google Scholar
  41. 41.
    Hu, H., Li, G., Bao, Z., Feng, J.: Crowdsourcing-based real-time urban traffic speed estimation: From speed to trend. In: ICDE, pp. 883–894 (2016)Google Scholar
  42. 42.
    Hu, H., Li, G., Bao, Z., Feng, J., Wu, Y., Gong, Z., Xu, Y.: Top-k spatio-textual similarity join. IEEE Trans. Knowl. Data Eng. 28(2), 551–565 (2016)CrossRefGoogle Scholar
  43. 43.
    Hu, H., Zheng, Y., Bao, Z., Li, G., Feng, J.: Crowdsourced poi labelling: Location-aware result inference and task assignment. In: ICDE, pp. 61–72 (2016)Google Scholar
  44. 44.
    Jeffery, S.R., Franklin, M.J., Halevy, A.Y.: Pay-as-you-go user feedback for dataspace systems. In: SIGMOD, pp. 847–860 (2008)Google Scholar
  45. 45.
    Jiang, X., Lim, L.H., Yao, Y., Ye, Y.: Statistical ranking and combinatorial hodge theory. Math. Program. pp. 203–244 (2011)MathSciNetCrossRefGoogle Scholar
  46. 46.
    Jiang, Y., Li, G., Feng, J., Li, W.: String similarity joins: An experimental evaluation. PVLDB 7(8), 625–636 (2014)Google Scholar
  47. 47.
    Kaplan, H., Lotosh, I., Milo, T., Novgorodov, S.: Answering planning queries with the crowd. PVLDB 6(9), 697–708 (2013)Google Scholar
  48. 48.
    Kazemi, L., Shahabi, C.: Geocrowd: enabling query answering with spatial crowdsourcing. In: SIGSPATIAL, pp. 189–198. ACM (2012)Google Scholar
  49. 49.
    Kazemi, L., Shahabi, C., Chen, L.: Geotrucrowd: trustworthy query answering with spatial crowdsourcing. In: SIGSPATIAL, pp. 304–313 (2013)Google Scholar
  50. 50.
    Khan, A.R., Garcia-Molina, H.: Hybrid strategies for finding the max with the crowd. Tech. rep. (2014)Google Scholar
  51. 51.
    Li, G., Deng, D., Feng, J.: A partition-based method for string similarity joins with edit-distance constraints. ACM Trans. Database Syst. 38(2), 9:1–9:33 (2013)MathSciNetCrossRefGoogle Scholar
  52. 52.
    Li, G., Deng, D., Wang, J., Feng, J.: PASS-JOIN: A partition-based method for similarity joins. PVLDB 5(3), 253–264 (2011)Google Scholar
  53. 53.
    Li, G., He, J., Deng, D., Li, J.: Efficient similarity join and search on multi-attribute data. In: SIGMOD, pp. 1137–1151 (2015)Google Scholar
  54. 54.
    Li, K., Li, X.Z.G., Feng, J.: A rating-ranking based framework for crowdsourced top-k computation. In: SIGMOD, pp. 1–16 (2018)Google Scholar
  55. 55.
    Lofi, C., Maarry, K.E., Balke, W.: Skyline queries in crowd-enabled databases. In: EDBT, pp. 465–476 (2013)Google Scholar
  56. 56.
    Lofi, C., Maarry, K.E., Balke, W.: Skyline queries over incomplete data - error models for focused crowd-sourcing. In: ER, pp. 298–312 (2013)Google Scholar
  57. 57.
    Lotosh, I., Milo, T., Novgorodov, S.: Crowdplanr: Planning made easy with crowd. In: ICDE, pp. 1344–1347. IEEE (2013)Google Scholar
  58. 58.
    Marcus, A., Karger, D.R., Madden, S., Miller, R., Oh, S.: Counting with the crowd. PVLDB 6(2), 109–120 (2012)Google Scholar
  59. 59.
    Marcus, A., Wu, E., Karger, D.R., Madden, S., Miller, R.C.: Human-powered sorts and joins. PVLDB 5(1), 13–24 (2011)Google Scholar
  60. 60.
    de Marneffe, M.C., MacCartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: LREC (2006)Google Scholar
  61. 61.
    Mozafari, B., Sarkar, P., Franklin, M., Jordan, M., Madden, S.: Scaling up crowd-sourcing to very large datasets: a case for active learning. PVLDB 8(2), 125–136 (2014)Google Scholar
  62. 62.
    Nau, D., Ghallab, M., Traverso, P.: Automated Planning: Theory & Practice. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2004)zbMATHGoogle Scholar
  63. 63.
    Negahban, S., Oh, S., Shah, D.: Iterative ranking from pair-wise comparisons. In: NIPS, pp. 2483–2491 (2012)Google Scholar
  64. 64.
    Nguyen, Q.V.H., Nguyen, T.T., Miklós, Z., Aberer, K., Gal, A., Weidlich, M.: Pay-as-you-go reconciliation in schema matching networks. In: ICDE, pp. 220–231. IEEE (2014)Google Scholar
  65. 65.
    Parameswaran, A.G., Boyd, S., Garcia-Molina, H., Gupta, A., Polyzotis, N., Widom, J.: Optimal crowd-powered rating and filtering algorithms. PVLDB 7(9), 685–696 (2014)Google Scholar
  66. 66.
    Parameswaran, A.G., Garcia-Molina, H., Park, H., Polyzotis, N., Ramesh, A., Widom, J.: Crowdscreen: algorithms for filtering data with humans. In: SIGMOD, pp. 361–372 (2012)Google Scholar
  67. 67.
    Parameswaran, A.G., Sarma, A.D., Garcia-Molina, H., Polyzotis, N., Widom, J.: Human-assisted graph search: it’s okay to ask questions. PVLDB 4(5), 267–278 (2011)Google Scholar
  68. 68.
    Park, H., Widom, J.: Crowdfill: collecting structured data from the crowd. In: SIGMOD, pp. 577–588 (2014)Google Scholar
  69. 69.
    Pfeiffer, T., Gao, X.A., Chen, Y., Mao, A., Rand, D.G.: Adaptive polling for information aggregation. In: AAAI (2012)Google Scholar
  70. 70.
    Pomerol, J.C., Barba-Romero, S.: Multicriterion decision in management: principles and practice, vol. 25. Springer (2000)Google Scholar
  71. 71.
    Pournajaf, L., Xiong, L., Sunderam, V., Goryczka, S.: Spatial task assignment for crowd sensing with cloaked locations. In: MDM, vol. 1, pp. 73–82. IEEE (2014)Google Scholar
  72. 72.
    Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDBJ 10(4), 334–350 (2001)CrossRefGoogle Scholar
  73. 73.
    Rekatsinas, T., Deshpande, A., Parameswaran, A.G.: Crowdgather: Entity extraction over structured domains. CoRR abs/1502.06823 (2015)Google Scholar
  74. 74.
    Sarawagi, S., Bhamidipaty, A.: Interactive deduplication using active learning. In: SIGKDD, pp. 269–278 (2002)Google Scholar
  75. 75.
    Sarma, A.D., Jain, A., Nandi, A., Parameswaran, A., Widom, J.: Jellybean: Crowd-powered image counting algorithms. Technical report, Stanford UniversityGoogle Scholar
  76. 76.
    Sarma, A.D., Parameswaran, A.G., Garcia-Molina, H., Halevy, A.Y.: Crowd-powered find algorithms. In: ICDE, pp. 964–975 (2014)Google Scholar
  77. 77.
    Shang, Z., Liu, Y., Li, G., Feng, J.: K-join: Knowledge-aware similarity join. IEEE Trans. Knowl. Data Eng. 28(12), 3293–3308 (2016)CrossRefGoogle Scholar
  78. 78.
    Su, H., Zheng, K., Huang, J., Jeung, H., Chen, L., Zhou, X.: Crowdplanner: A crowd-based route recommendation system. In: ICDE, pp. 1144–1155. IEEE (2014)Google Scholar
  79. 79.
    Su, H., Zheng, K., Huang, J., Liu, T., Wang, H., Zhou, X.: A crowd-based route recommendation system-crowdplanner. In: ICDE, pp. 1178–1181 (2014)Google Scholar
  80. 80.
    Sun, J., Shang, Z., Li, G., Deng, D., Bao, Z.: Dima: A distributed in-memory similarity-based query processing system. PVLDB 10(12), 1925–1928 (2017)Google Scholar
  81. 81.
    Ta, N., Li, G., Feng, J.: An efficient ride-sharing framework for maximizing shared route. TKDE 32(9), 3001–3015 (2017)Google Scholar
  82. 82.
    Talamadupula, K., Kambhampati, S., Hu, Y., Nguyen, T.A., Zhuo, H.H.: Herding the crowd: Automated planning for crowdsourced planning. In: HCOMP (2013)Google Scholar
  83. 83.
    To, H., Ghinita, G., Shahabi, C.: A framework for protecting worker location privacy in spatial crowdsourcing. PVLDB 7(10), 919–930 (2014)Google Scholar
  84. 84.
    Trushkowsky, B., Kraska, T., Franklin, M.J., Sarkar, P.: Crowdsourced enumeration queries. In: ICDE, pp. 673–684 (2013)Google Scholar
  85. 85.
    Venetis, P., Garcia-Molina, H., Huang, K., Polyzotis, N.: Max algorithms in crowdsourcing environments. In: WWW, pp. 989–998 (2012)Google Scholar
  86. 86.
    Verroios, V., Garcia-Molina, H.: Entity resolution with crowd errors. In: ICDE, pp. 219–230 (2015)Google Scholar
  87. 87.
    Verroios, V., Garcia-Molina, H., Papakonstantinou, Y.: Waldo: An adaptive human interface for crowd entity resolution. In: SIGMOD, pp. 1133–1148. ACM (2017)Google Scholar
  88. 88.
    Vesdapunt, N., Bellare, K., Dalvi, N.N.: Crowdsourcing algorithms for entity resolution. PVLDB 7(12), 1071–1082 (2014)Google Scholar
  89. 89.
    Wang, J., Kraska, T., Franklin, M.J., Feng, J.: CrowdER: crowdsourcing entity resolution. PVLDB 5(11), 1483–1494 (2012)Google Scholar
  90. 90.
    Wang, J., Li, G., Feng, J.: Trie-join: Efficient trie-based string similarity joins with edit-distance constraints. PVLDB 3(1), 1219–1230 (2010)Google Scholar
  91. 91.
    Wang, J., Li, G., Feng, J.: Fast-join: An efficient method for fuzzy token matching based string similarity join. In: ICDE, pp. 458–469 (2011)Google Scholar
  92. 92.
    Wang, J., Li, G., Feng, J.: Can we beat the prefix filtering?: an adaptive framework for similarity join and search. In: SIGMOD, pp. 85–96 (2012)Google Scholar
  93. 93.
    Wang, J., Li, G., Feng, J.: Extending string similarity join to tolerant fuzzy token matching. ACM Trans. Database Syst. 39(1), 7:1–7:45 (2014)MathSciNetCrossRefGoogle Scholar
  94. 94.
    Wang, J., Li, G., Kraska, T., Franklin, M.J., Feng, J.: Leveraging transitive relations for crowdsourced joins. In: SIGMOD, pp. 229–240 (2013)Google Scholar
  95. 95.
    Wang, S., Xiao, X., Lee, C.: Crowd-based deduplication: An adaptive approach. In: SIGMOD, pp. 1263–1277 (2015)Google Scholar
  96. 96.
    Wauthier, F.L., Jordan, M.I., Jojic, N.: Efficient ranking from pairwise comparisons. In: ICML, pp. 109–117 (2013)Google Scholar
  97. 97.
    Weng, X., Li, G., Hu, H., Feng, J.: Crowdsourced selection on multi-attribute data. In: CIKM, pp. 307–316 (2017)Google Scholar
  98. 98.
    Whang, S.E., Lofgren, P., Garcia-Molina, H.: Question selection for crowd entity resolution. PVLDB 6(6), 349–360 (2013)Google Scholar
  99. 99.
    Whang, S.E., McAuley, J., Garcia-Molina, H.: Compare me maybe: Crowd entity resolution interfaces. Technical report, Stanford UniversityGoogle Scholar
  100. 100.
    Yan, T., Kumar, V., Ganesan, D.: Crowdsearch: exploiting crowds for accurate real-time image search on mobile phones. In: MobiSys, pp. 77–90 (2010)Google Scholar
  101. 101.
    Ye, P., EDU, U., Doermann, D.: Combining preference and absolute judgements in a crowd-sourced setting. In: ICML Workshop (2013)Google Scholar
  102. 102.
    Yi, J., Jin, R., Jain, A.K., Jain, S., Yang, T.: Semi-crowdsourced clustering: Generalizing crowd labeling by robust distance metric learning. In: NIPS, pp. 1781–1789 (2012)Google Scholar
  103. 103.
    Yu, M., Li, G., Deng, D., Feng, J.: String similarity search and join: a survey. Frontiers of Computer Science 10(3), 399–417 (2016)CrossRefGoogle Scholar
  104. 104.
    Yu, M., Wang, J., Li, G., Zhang, Y., Deng, D., Feng, J.: A unified framework for string similarity search with edit-distance constraint. VLDB J. 26(2), 249–274 (2017)CrossRefGoogle Scholar
  105. 105.
    Zhang, C.J., Chen, L., Jagadish, H.V., Cao, C.C.: Reducing uncertainty of schema matching via crowdsourcing. PVLDB 6(9), 757–768 (2013)Google Scholar
  106. 106.
    Zhang, C.J., Tong, Y., Chen, L.: Where to: Crowd-aided path selection. PVLDB 7(14), 2005–2016 (2014)Google Scholar
  107. 107.
    Zhang, X., Li, G., Feng, J.: Crowdsourced top-k algorithms: An experimental evaluation. PVLDB 9(4) (2015)MathSciNetCrossRefGoogle Scholar
  108. 108.
    Zhuang, Y., Li, G., Zhong, Z., Feng, J.: Hike: A hybrid human-machine method for entity alignment in large-scale knowledge bases. In: CIKM, pp. 1917–1926 (2017)Google Scholar
  109. 109.
    Zhuo, H.H.: Crowdsourced action-model acquisition for planning. In: AAAI, pp. 3439–3446Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  • Guoliang Li
    • 1
  • Jiannan Wang
    • 2
  • Yudian Zheng
    • 3
  • Ju Fan
    • 4
  • Michael J. Franklin
    • 5
  1. 1.Department of Computer Science and TechnologyTsinghua UniversityBeijingChina
  2. 2.School of Computing ScienceSimon Fraser UniversityBurnabyCanada
  3. 3.Twitter Inc.San FranciscoUSA
  4. 4.DEKE Lab & School of InformationRenmin University of ChinaBeijingChina
  5. 5.Department of Computer ScienceUniversity of ChicagoChicagoUSA

Personalised recommendations