Crowdsourced Operators

Li, Guoliang; Wang, Jiannan; Zheng, Yudian; Fan, Ju; Franklin, Michael J.

doi:10.1007/978-981-10-7847-7_7

Guoliang Li⁶,
Jiannan Wang⁷,
Yudian Zheng⁸,
Ju Fan⁹ &
…
Michael J. Franklin¹⁰

442 Accesses

Abstract

To obtain high-quality results, different applications require the use of different crowdsourced operators, which have operator-specific optimization goals over three factors: cost, quality, and latency. This chapter reviews how crowdsourced operators (i.e., crowdsourced selection, crowdsourced collection, crowdsourced join, crowdsourced sort, crowdsourced top-k, crowdsourced max/min, crowdsourced aggregation, crowdsourced categorization, crowdsourced skyline, crowdsourced planning, crowdsourced schema matching, crowd mining, spatial crowdsourcing) can be implemented and optimized.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Each task is assigned to five workers at most, and majority voting is used to aggregate answers, i.e., the result is returned upon getting three consistent answers.
2.
the symbol ‘‘[]’’ means anything , and its value is not cared.

References

Amazon mechanical turk. https://www.mturk.com/
Adelsman, R.M., Whinston, A.B.: Sophisticated voting with information for two voting functions. Journal of Economic Theory 15(1), 145–159 (1977)
Article MathSciNet Google Scholar
Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: SIGMOD, pp. 207–216 (1993)
Google Scholar
Amarilli, A., Amsterdamer, Y., Milo, T.: On the complexity of mining itemsets from the crowd using taxonomies. In: ICDT, pp. 15–25 (2015)
Google Scholar
Amsterdamer, Y., Davidson, S., Kukliansky, A., Milo, T., Novgorodov, S., Somech, A.: Managing general and individual knowledge in crowd mining applications. In: CIDR (2015)
Google Scholar
Amsterdamer, Y., Davidson, S.B., Milo, T., Novgorodov, S., Somech, A.: Oassis: query driven crowd mining. In: SIGMOD, pp. 589–600. ACM (2014)
Google Scholar
Amsterdamer, Y., Davidson, S.B., Milo, T., Novgorodov, S., Somech, A.: Ontology assisted crowd mining. PVLDB 7(13), 1597–1600 (2014)
Google Scholar
Amsterdamer, Y., Grossman, Y., Milo, T., Senellart, P.: Crowd mining. In: SIGMOD, pp. 241–252. ACM (2013)
Google Scholar
Amsterdamer, Y., Grossman, Y., Milo, T., Senellart, P.: Crowdminer: Mining association rules from the crowd. PVLDB 6(12), 1250–1253 (2013)
Google Scholar
Amsterdamer, Y., Kukliansky, A., Milo, T.: Nl2cm: A natural language interface to crowd mining. In: SIGMOD, pp. 1433–1438. ACM (2015)
Google Scholar
Artikis, A., Weidlich, M., Schnitzler, F., Boutsis, I., Liebig, T., Piatkowski, N., Bockermann, C., Morik, K., Kalogeraki, V., Marecek, J., et al.: Heterogeneous stream processing and crowdsourcing for urban traffic management. In: EDBT, pp. 712–723 (2014)
Google Scholar
Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: ICDE, pp. 421–430 (2001)
Google Scholar
Bradley, R.A., Terry, M.E.: Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika pp. 324–345 (1952)
Google Scholar
Busa-Fekete, R., Szorenyi, B., Cheng, W., Weng, P., Hullermeier, E.: Top-k selection based on adaptive sampling of noisy preferences. In: ICML, pp. 1094–1102 (2013)
Google Scholar
Chai, C., Li, G., Li, J., Deng, D., Feng, J.: Cost-effective crowdsourced entity resolution: A partial-order approach. In: SIGMOD, pp. 969–984 (2016)
Google Scholar
Chen, X., Bennett, P.N., Collins-Thompson, K., Horvitz, E.: Pairwise ranking aggregation in a crowdsourced setting. In: WSDM, pp. 193–202 (2013)
Google Scholar
Chen, Z., Fu, R., Zhao, Z., Liu, Z., Xia, L., Chen, L., Cheng, P., Cao, C.C., Tong, Y., Zhang, C.J.: gmission: a general spatial crowdsourcing platform. PVLDB 7(13), 1629–1632 (2014)
Article Google Scholar
Chilton, L.B., Little, G., Edge, D., Weld, D.S., Landay, J.A.: Cascade: crowdsourcing taxonomy creation. In: CHI, pp. 1999–2008 (2013). doi: 10.1145/2470654.2466265
Chung, Y., Mortensen, M.L., Binnig, C., Kraska, T.: Estimating the impact of unknown unknowns on aggregate query results. In: SIGMOD, pp. 861–876 (2016). doi: 10.1145/2882903.2882909
Davidson, S.B., Khanna, S., Milo, T., Roy, S.: Using the crowd for top-k and group-by queries. In: ICDT, pp. 225–236 (2013)
Google Scholar
Demartini, G., Difallah, D.E., Cudré-Mauroux, P.: Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: WWW, pp. 469–478 (2012)
Google Scholar
Deng, D., Li, G., Hao, S., Wang, J., Feng, J.: Massjoin: A mapreduce-based method for scalable string similarity joins. In: ICDE, pp. 340–351 (2014)
Google Scholar
Deng, D., Li, G., Wen, H., Feng, J.: An efficient partition based method for exact set similarity joins. PVLDB 9(4), 360–371 (2015)
Google Scholar
Deng, D., Shahabi, C., Demiryurek, U.: Maximizing the number of worker’s self-selected tasks in spatial crowdsourcing. In: SIGSPATIAL, pp. 324–333. ACM (2013)
Google Scholar
Deng, D., Tao, Y., Li, G.: Overlap set similarity joins with theoretical guarantees. In: SIGMOD (2018)
Google Scholar
Elo, A.E.: The rating of chessplayers, past and present, vol. 3. Batsford London (1978)
Google Scholar
Eriksson, B.: Learning to top-k search using pairwise comparisons. In: AISTATS, pp. 265–273 (2013)
Google Scholar
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences 66(4), 614–656 (2003)
Article MathSciNet Google Scholar
Fan, J., Lu, M., Ooi, B.C., Tan, W.C., Zhang, M.: A hybrid machine-crowdsourcing system for matching web tables. In: ICDE, pp. 976–987. IEEE (2014)
Google Scholar
Fan, J., Wei, Z., Zhang, D., Yang, J., Du, X.: Distribution-aware crowdsourced entity collection. IEEE Trans. Knowl. Data Eng. (2017)
Google Scholar
Feige, U., Raghavan, P., Peleg, D., Upfal, E.: Computing with noisy information. SIAM J. Comput. pp. 1001–1018 (1994)
Article MathSciNet Google Scholar
Feng, J., Wang, J., Li, G.: Trie-join: a trie-based method for efficient string similarity joins. VLDB J. 21(4), 437–461 (2012)
Article Google Scholar
Gokhale, C., Das, S., Doan, A., Naughton, J.F., Rampalli, N., Shavlik, J.W., Zhu, X.: Corleone: hands-off crowdsourcing for entity matching. In: SIGMOD, pp. 601–612 (2014)
Google Scholar
Gomes, R., Welinder, P., Krause, A., Perona, P.: Crowdclustering. In: NIPS, pp. 558–566 (2011)
Google Scholar
Groz, B., Milo, T.: Skyline queries with noisy comparisons. In: PODS, pp. 185–198 (2015)
Google Scholar
Gruenheid, A., Kossmann, D., Ramesh, S., Widmer, F.: Crowdsourcing entity resolution: When is A=B? Technical report, ETH Zürich
Google Scholar
Guo, S., Parameswaran, A.G., Garcia-Molina, H.: So who won?: dynamic max discovery with the crowd. In: SIGMOD, pp. 385–396 (2012)
Google Scholar
ul Hassan, U., Curry, E.: A multi-armed bandit approach to online spatial task assignment. In: UIC (2014)
Google Scholar
Heikinheimo, H., Ukkonen, A.: The crowd-median algorithm. In: HCOMP (2013)
Google Scholar
Herbrich, R., Minka, T., Graepel, T.: Trueskill: A bayesian skill rating system. In: NIPS, pp. 569–576 (2006)
Google Scholar
Hu, H., Li, G., Bao, Z., Feng, J.: Crowdsourcing-based real-time urban traffic speed estimation: From speed to trend. In: ICDE, pp. 883–894 (2016)
Google Scholar
Hu, H., Li, G., Bao, Z., Feng, J., Wu, Y., Gong, Z., Xu, Y.: Top-k spatio-textual similarity join. IEEE Trans. Knowl. Data Eng. 28(2), 551–565 (2016)
Article Google Scholar
Hu, H., Zheng, Y., Bao, Z., Li, G., Feng, J.: Crowdsourced poi labelling: Location-aware result inference and task assignment. In: ICDE, pp. 61–72 (2016)
Google Scholar
Jeffery, S.R., Franklin, M.J., Halevy, A.Y.: Pay-as-you-go user feedback for dataspace systems. In: SIGMOD, pp. 847–860 (2008)
Google Scholar
Jiang, X., Lim, L.H., Yao, Y., Ye, Y.: Statistical ranking and combinatorial hodge theory. Math. Program. pp. 203–244 (2011)
Article MathSciNet Google Scholar
Jiang, Y., Li, G., Feng, J., Li, W.: String similarity joins: An experimental evaluation. PVLDB 7(8), 625–636 (2014)
Google Scholar
Kaplan, H., Lotosh, I., Milo, T., Novgorodov, S.: Answering planning queries with the crowd. PVLDB 6(9), 697–708 (2013)
Google Scholar
Kazemi, L., Shahabi, C.: Geocrowd: enabling query answering with spatial crowdsourcing. In: SIGSPATIAL, pp. 189–198. ACM (2012)
Google Scholar
Kazemi, L., Shahabi, C., Chen, L.: Geotrucrowd: trustworthy query answering with spatial crowdsourcing. In: SIGSPATIAL, pp. 304–313 (2013)
Google Scholar
Khan, A.R., Garcia-Molina, H.: Hybrid strategies for finding the max with the crowd. Tech. rep. (2014)
Google Scholar
Li, G., Deng, D., Feng, J.: A partition-based method for string similarity joins with edit-distance constraints. ACM Trans. Database Syst. 38(2), 9:1–9:33 (2013)
Article MathSciNet Google Scholar
Li, G., Deng, D., Wang, J., Feng, J.: PASS-JOIN: A partition-based method for similarity joins. PVLDB 5(3), 253–264 (2011)
Google Scholar
Li, G., He, J., Deng, D., Li, J.: Efficient similarity join and search on multi-attribute data. In: SIGMOD, pp. 1137–1151 (2015)
Google Scholar
Li, K., Li, X.Z.G., Feng, J.: A rating-ranking based framework for crowdsourced top-k computation. In: SIGMOD, pp. 1–16 (2018)
Google Scholar
Lofi, C., Maarry, K.E., Balke, W.: Skyline queries in crowd-enabled databases. In: EDBT, pp. 465–476 (2013)
Google Scholar
Lofi, C., Maarry, K.E., Balke, W.: Skyline queries over incomplete data - error models for focused crowd-sourcing. In: ER, pp. 298–312 (2013)
Google Scholar
Lotosh, I., Milo, T., Novgorodov, S.: Crowdplanr: Planning made easy with crowd. In: ICDE, pp. 1344–1347. IEEE (2013)
Google Scholar
Marcus, A., Karger, D.R., Madden, S., Miller, R., Oh, S.: Counting with the crowd. PVLDB 6(2), 109–120 (2012)
Google Scholar
Marcus, A., Wu, E., Karger, D.R., Madden, S., Miller, R.C.: Human-powered sorts and joins. PVLDB 5(1), 13–24 (2011)
Google Scholar
de Marneffe, M.C., MacCartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: LREC (2006)
Google Scholar
Mozafari, B., Sarkar, P., Franklin, M., Jordan, M., Madden, S.: Scaling up crowd-sourcing to very large datasets: a case for active learning. PVLDB 8(2), 125–136 (2014)
Google Scholar
Nau, D., Ghallab, M., Traverso, P.: Automated Planning: Theory & Practice. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2004)
MATH Google Scholar
Negahban, S., Oh, S., Shah, D.: Iterative ranking from pair-wise comparisons. In: NIPS, pp. 2483–2491 (2012)
Google Scholar
Nguyen, Q.V.H., Nguyen, T.T., Miklós, Z., Aberer, K., Gal, A., Weidlich, M.: Pay-as-you-go reconciliation in schema matching networks. In: ICDE, pp. 220–231. IEEE (2014)
Google Scholar
Parameswaran, A.G., Boyd, S., Garcia-Molina, H., Gupta, A., Polyzotis, N., Widom, J.: Optimal crowd-powered rating and filtering algorithms. PVLDB 7(9), 685–696 (2014)
Google Scholar
Parameswaran, A.G., Garcia-Molina, H., Park, H., Polyzotis, N., Ramesh, A., Widom, J.: Crowdscreen: algorithms for filtering data with humans. In: SIGMOD, pp. 361–372 (2012)
Google Scholar
Parameswaran, A.G., Sarma, A.D., Garcia-Molina, H., Polyzotis, N., Widom, J.: Human-assisted graph search: it’s okay to ask questions. PVLDB 4(5), 267–278 (2011)
Google Scholar
Park, H., Widom, J.: Crowdfill: collecting structured data from the crowd. In: SIGMOD, pp. 577–588 (2014)
Google Scholar
Pfeiffer, T., Gao, X.A., Chen, Y., Mao, A., Rand, D.G.: Adaptive polling for information aggregation. In: AAAI (2012)
Google Scholar
Pomerol, J.C., Barba-Romero, S.: Multicriterion decision in management: principles and practice, vol. 25. Springer (2000)
Google Scholar
Pournajaf, L., Xiong, L., Sunderam, V., Goryczka, S.: Spatial task assignment for crowd sensing with cloaked locations. In: MDM, vol. 1, pp. 73–82. IEEE (2014)
Google Scholar
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDBJ 10(4), 334–350 (2001)
Article Google Scholar
Rekatsinas, T., Deshpande, A., Parameswaran, A.G.: Crowdgather: Entity extraction over structured domains. CoRR abs/1502.06823 (2015)
Google Scholar
Sarawagi, S., Bhamidipaty, A.: Interactive deduplication using active learning. In: SIGKDD, pp. 269–278 (2002)
Google Scholar
Sarma, A.D., Jain, A., Nandi, A., Parameswaran, A., Widom, J.: Jellybean: Crowd-powered image counting algorithms. Technical report, Stanford University
Google Scholar
Sarma, A.D., Parameswaran, A.G., Garcia-Molina, H., Halevy, A.Y.: Crowd-powered find algorithms. In: ICDE, pp. 964–975 (2014)
Google Scholar
Shang, Z., Liu, Y., Li, G., Feng, J.: K-join: Knowledge-aware similarity join. IEEE Trans. Knowl. Data Eng. 28(12), 3293–3308 (2016)
Article Google Scholar
Su, H., Zheng, K., Huang, J., Jeung, H., Chen, L., Zhou, X.: Crowdplanner: A crowd-based route recommendation system. In: ICDE, pp. 1144–1155. IEEE (2014)
Google Scholar
Su, H., Zheng, K., Huang, J., Liu, T., Wang, H., Zhou, X.: A crowd-based route recommendation system-crowdplanner. In: ICDE, pp. 1178–1181 (2014)
Google Scholar
Sun, J., Shang, Z., Li, G., Deng, D., Bao, Z.: Dima: A distributed in-memory similarity-based query processing system. PVLDB 10(12), 1925–1928 (2017)
Google Scholar
Ta, N., Li, G., Feng, J.: An efficient ride-sharing framework for maximizing shared route. TKDE 32(9), 3001–3015 (2017)
Google Scholar
Talamadupula, K., Kambhampati, S., Hu, Y., Nguyen, T.A., Zhuo, H.H.: Herding the crowd: Automated planning for crowdsourced planning. In: HCOMP (2013)
Google Scholar
To, H., Ghinita, G., Shahabi, C.: A framework for protecting worker location privacy in spatial crowdsourcing. PVLDB 7(10), 919–930 (2014)
Google Scholar
Trushkowsky, B., Kraska, T., Franklin, M.J., Sarkar, P.: Crowdsourced enumeration queries. In: ICDE, pp. 673–684 (2013)
Google Scholar
Venetis, P., Garcia-Molina, H., Huang, K., Polyzotis, N.: Max algorithms in crowdsourcing environments. In: WWW, pp. 989–998 (2012)
Google Scholar
Verroios, V., Garcia-Molina, H.: Entity resolution with crowd errors. In: ICDE, pp. 219–230 (2015)
Google Scholar
Verroios, V., Garcia-Molina, H., Papakonstantinou, Y.: Waldo: An adaptive human interface for crowd entity resolution. In: SIGMOD, pp. 1133–1148. ACM (2017)
Google Scholar
Vesdapunt, N., Bellare, K., Dalvi, N.N.: Crowdsourcing algorithms for entity resolution. PVLDB 7(12), 1071–1082 (2014)
Google Scholar
Wang, J., Kraska, T., Franklin, M.J., Feng, J.: CrowdER: crowdsourcing entity resolution. PVLDB 5(11), 1483–1494 (2012)
Google Scholar
Wang, J., Li, G., Feng, J.: Trie-join: Efficient trie-based string similarity joins with edit-distance constraints. PVLDB 3(1), 1219–1230 (2010)
Google Scholar
Wang, J., Li, G., Feng, J.: Fast-join: An efficient method for fuzzy token matching based string similarity join. In: ICDE, pp. 458–469 (2011)
Google Scholar
Wang, J., Li, G., Feng, J.: Can we beat the prefix filtering?: an adaptive framework for similarity join and search. In: SIGMOD, pp. 85–96 (2012)
Google Scholar
Wang, J., Li, G., Feng, J.: Extending string similarity join to tolerant fuzzy token matching. ACM Trans. Database Syst. 39(1), 7:1–7:45 (2014)
Article MathSciNet Google Scholar
Wang, J., Li, G., Kraska, T., Franklin, M.J., Feng, J.: Leveraging transitive relations for crowdsourced joins. In: SIGMOD, pp. 229–240 (2013)
Google Scholar
Wang, S., Xiao, X., Lee, C.: Crowd-based deduplication: An adaptive approach. In: SIGMOD, pp. 1263–1277 (2015)
Google Scholar
Wauthier, F.L., Jordan, M.I., Jojic, N.: Efficient ranking from pairwise comparisons. In: ICML, pp. 109–117 (2013)
Google Scholar
Weng, X., Li, G., Hu, H., Feng, J.: Crowdsourced selection on multi-attribute data. In: CIKM, pp. 307–316 (2017)
Google Scholar
Whang, S.E., Lofgren, P., Garcia-Molina, H.: Question selection for crowd entity resolution. PVLDB 6(6), 349–360 (2013)
Google Scholar
Whang, S.E., McAuley, J., Garcia-Molina, H.: Compare me maybe: Crowd entity resolution interfaces. Technical report, Stanford University
Google Scholar
Yan, T., Kumar, V., Ganesan, D.: Crowdsearch: exploiting crowds for accurate real-time image search on mobile phones. In: MobiSys, pp. 77–90 (2010)
Google Scholar
Ye, P., EDU, U., Doermann, D.: Combining preference and absolute judgements in a crowd-sourced setting. In: ICML Workshop (2013)
Google Scholar
Yi, J., Jin, R., Jain, A.K., Jain, S., Yang, T.: Semi-crowdsourced clustering: Generalizing crowd labeling by robust distance metric learning. In: NIPS, pp. 1781–1789 (2012)
Google Scholar
Yu, M., Li, G., Deng, D., Feng, J.: String similarity search and join: a survey. Frontiers of Computer Science 10(3), 399–417 (2016)
Article Google Scholar
Yu, M., Wang, J., Li, G., Zhang, Y., Deng, D., Feng, J.: A unified framework for string similarity search with edit-distance constraint. VLDB J. 26(2), 249–274 (2017)
Article Google Scholar
Zhang, C.J., Chen, L., Jagadish, H.V., Cao, C.C.: Reducing uncertainty of schema matching via crowdsourcing. PVLDB 6(9), 757–768 (2013)
Google Scholar
Zhang, C.J., Tong, Y., Chen, L.: Where to: Crowd-aided path selection. PVLDB 7(14), 2005–2016 (2014)
Google Scholar
Zhang, X., Li, G., Feng, J.: Crowdsourced top-k algorithms: An experimental evaluation. PVLDB 9(4) (2015)
Article MathSciNet Google Scholar
Zhuang, Y., Li, G., Zhong, Z., Feng, J.: Hike: A hybrid human-machine method for entity alignment in large-scale knowledge bases. In: CIKM, pp. 1917–1926 (2017)
Google Scholar
Zhuo, H.H.: Crowdsourced action-model acquisition for planning. In: AAAI, pp. 3439–3446
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, Beijing, China
Guoliang Li
School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
Jiannan Wang
Twitter Inc., San Francisco, CA, USA
Yudian Zheng
DEKE Lab & School of Information, Renmin University of China, Beijing, Beijing, China
Ju Fan
Department of Computer Science, University of Chicago, Chicago, IL, USA
Michael J. Franklin

Authors

Guoliang Li
View author publications
You can also search for this author in PubMed Google Scholar
Jiannan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yudian Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Ju Fan
View author publications
You can also search for this author in PubMed Google Scholar
Michael J. Franklin
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Li, G., Wang, J., Zheng, Y., Fan, J., Franklin, M.J. (2018). Crowdsourced Operators. In: Crowdsourced Data Management. Springer, Singapore. https://doi.org/10.1007/978-981-10-7847-7_7

Download citation

DOI: https://doi.org/10.1007/978-981-10-7847-7_7
Published: 13 October 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7846-0
Online ISBN: 978-981-10-7847-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics