Abstract
To obtain high-quality results, different applications require the use of different crowdsourced operators, which have operator-specific optimization goals over three factors: cost, quality, and latency. This chapter reviews how crowdsourced operators (i.e., crowdsourced selection, crowdsourced collection, crowdsourced join, crowdsourced sort, crowdsourced top-k, crowdsourced max/min, crowdsourced aggregation, crowdsourced categorization, crowdsourced skyline, crowdsourced planning, crowdsourced schema matching, crowd mining, spatial crowdsourcing) can be implemented and optimized.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Each task is assigned to five workers at most, and majority voting is used to aggregate answers, i.e., the result is returned upon getting three consistent answers.
- 2.
the symbol ‘‘[]’’ means anything , and its value is not cared.
References
Amazon mechanical turk. https://www.mturk.com/
Adelsman, R.M., Whinston, A.B.: Sophisticated voting with information for two voting functions. Journal of Economic Theory 15(1), 145–159 (1977)
Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: SIGMOD, pp. 207–216 (1993)
Amarilli, A., Amsterdamer, Y., Milo, T.: On the complexity of mining itemsets from the crowd using taxonomies. In: ICDT, pp. 15–25 (2015)
Amsterdamer, Y., Davidson, S., Kukliansky, A., Milo, T., Novgorodov, S., Somech, A.: Managing general and individual knowledge in crowd mining applications. In: CIDR (2015)
Amsterdamer, Y., Davidson, S.B., Milo, T., Novgorodov, S., Somech, A.: Oassis: query driven crowd mining. In: SIGMOD, pp. 589–600. ACM (2014)
Amsterdamer, Y., Davidson, S.B., Milo, T., Novgorodov, S., Somech, A.: Ontology assisted crowd mining. PVLDB 7(13), 1597–1600 (2014)
Amsterdamer, Y., Grossman, Y., Milo, T., Senellart, P.: Crowd mining. In: SIGMOD, pp. 241–252. ACM (2013)
Amsterdamer, Y., Grossman, Y., Milo, T., Senellart, P.: Crowdminer: Mining association rules from the crowd. PVLDB 6(12), 1250–1253 (2013)
Amsterdamer, Y., Kukliansky, A., Milo, T.: Nl2cm: A natural language interface to crowd mining. In: SIGMOD, pp. 1433–1438. ACM (2015)
Artikis, A., Weidlich, M., Schnitzler, F., Boutsis, I., Liebig, T., Piatkowski, N., Bockermann, C., Morik, K., Kalogeraki, V., Marecek, J., et al.: Heterogeneous stream processing and crowdsourcing for urban traffic management. In: EDBT, pp. 712–723 (2014)
Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: ICDE, pp. 421–430 (2001)
Bradley, R.A., Terry, M.E.: Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika pp. 324–345 (1952)
Busa-Fekete, R., Szorenyi, B., Cheng, W., Weng, P., Hullermeier, E.: Top-k selection based on adaptive sampling of noisy preferences. In: ICML, pp. 1094–1102 (2013)
Chai, C., Li, G., Li, J., Deng, D., Feng, J.: Cost-effective crowdsourced entity resolution: A partial-order approach. In: SIGMOD, pp. 969–984 (2016)
Chen, X., Bennett, P.N., Collins-Thompson, K., Horvitz, E.: Pairwise ranking aggregation in a crowdsourced setting. In: WSDM, pp. 193–202 (2013)
Chen, Z., Fu, R., Zhao, Z., Liu, Z., Xia, L., Chen, L., Cheng, P., Cao, C.C., Tong, Y., Zhang, C.J.: gmission: a general spatial crowdsourcing platform. PVLDB 7(13), 1629–1632 (2014)
Chilton, L.B., Little, G., Edge, D., Weld, D.S., Landay, J.A.: Cascade: crowdsourcing taxonomy creation. In: CHI, pp. 1999–2008 (2013). doi: 10.1145/2470654.2466265
Chung, Y., Mortensen, M.L., Binnig, C., Kraska, T.: Estimating the impact of unknown unknowns on aggregate query results. In: SIGMOD, pp. 861–876 (2016). doi: 10.1145/2882903.2882909
Davidson, S.B., Khanna, S., Milo, T., Roy, S.: Using the crowd for top-k and group-by queries. In: ICDT, pp. 225–236 (2013)
Demartini, G., Difallah, D.E., Cudré-Mauroux, P.: Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: WWW, pp. 469–478 (2012)
Deng, D., Li, G., Hao, S., Wang, J., Feng, J.: Massjoin: A mapreduce-based method for scalable string similarity joins. In: ICDE, pp. 340–351 (2014)
Deng, D., Li, G., Wen, H., Feng, J.: An efficient partition based method for exact set similarity joins. PVLDB 9(4), 360–371 (2015)
Deng, D., Shahabi, C., Demiryurek, U.: Maximizing the number of worker’s self-selected tasks in spatial crowdsourcing. In: SIGSPATIAL, pp. 324–333. ACM (2013)
Deng, D., Tao, Y., Li, G.: Overlap set similarity joins with theoretical guarantees. In: SIGMOD (2018)
Elo, A.E.: The rating of chessplayers, past and present, vol. 3. Batsford London (1978)
Eriksson, B.: Learning to top-k search using pairwise comparisons. In: AISTATS, pp. 265–273 (2013)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences 66(4), 614–656 (2003)
Fan, J., Lu, M., Ooi, B.C., Tan, W.C., Zhang, M.: A hybrid machine-crowdsourcing system for matching web tables. In: ICDE, pp. 976–987. IEEE (2014)
Fan, J., Wei, Z., Zhang, D., Yang, J., Du, X.: Distribution-aware crowdsourced entity collection. IEEE Trans. Knowl. Data Eng. (2017)
Feige, U., Raghavan, P., Peleg, D., Upfal, E.: Computing with noisy information. SIAM J. Comput. pp. 1001–1018 (1994)
Feng, J., Wang, J., Li, G.: Trie-join: a trie-based method for efficient string similarity joins. VLDB J. 21(4), 437–461 (2012)
Gokhale, C., Das, S., Doan, A., Naughton, J.F., Rampalli, N., Shavlik, J.W., Zhu, X.: Corleone: hands-off crowdsourcing for entity matching. In: SIGMOD, pp. 601–612 (2014)
Gomes, R., Welinder, P., Krause, A., Perona, P.: Crowdclustering. In: NIPS, pp. 558–566 (2011)
Groz, B., Milo, T.: Skyline queries with noisy comparisons. In: PODS, pp. 185–198 (2015)
Gruenheid, A., Kossmann, D., Ramesh, S., Widmer, F.: Crowdsourcing entity resolution: When is A=B? Technical report, ETH Zürich
Guo, S., Parameswaran, A.G., Garcia-Molina, H.: So who won?: dynamic max discovery with the crowd. In: SIGMOD, pp. 385–396 (2012)
ul Hassan, U., Curry, E.: A multi-armed bandit approach to online spatial task assignment. In: UIC (2014)
Heikinheimo, H., Ukkonen, A.: The crowd-median algorithm. In: HCOMP (2013)
Herbrich, R., Minka, T., Graepel, T.: Trueskill: A bayesian skill rating system. In: NIPS, pp. 569–576 (2006)
Hu, H., Li, G., Bao, Z., Feng, J.: Crowdsourcing-based real-time urban traffic speed estimation: From speed to trend. In: ICDE, pp. 883–894 (2016)
Hu, H., Li, G., Bao, Z., Feng, J., Wu, Y., Gong, Z., Xu, Y.: Top-k spatio-textual similarity join. IEEE Trans. Knowl. Data Eng. 28(2), 551–565 (2016)
Hu, H., Zheng, Y., Bao, Z., Li, G., Feng, J.: Crowdsourced poi labelling: Location-aware result inference and task assignment. In: ICDE, pp. 61–72 (2016)
Jeffery, S.R., Franklin, M.J., Halevy, A.Y.: Pay-as-you-go user feedback for dataspace systems. In: SIGMOD, pp. 847–860 (2008)
Jiang, X., Lim, L.H., Yao, Y., Ye, Y.: Statistical ranking and combinatorial hodge theory. Math. Program. pp. 203–244 (2011)
Jiang, Y., Li, G., Feng, J., Li, W.: String similarity joins: An experimental evaluation. PVLDB 7(8), 625–636 (2014)
Kaplan, H., Lotosh, I., Milo, T., Novgorodov, S.: Answering planning queries with the crowd. PVLDB 6(9), 697–708 (2013)
Kazemi, L., Shahabi, C.: Geocrowd: enabling query answering with spatial crowdsourcing. In: SIGSPATIAL, pp. 189–198. ACM (2012)
Kazemi, L., Shahabi, C., Chen, L.: Geotrucrowd: trustworthy query answering with spatial crowdsourcing. In: SIGSPATIAL, pp. 304–313 (2013)
Khan, A.R., Garcia-Molina, H.: Hybrid strategies for finding the max with the crowd. Tech. rep. (2014)
Li, G., Deng, D., Feng, J.: A partition-based method for string similarity joins with edit-distance constraints. ACM Trans. Database Syst. 38(2), 9:1–9:33 (2013)
Li, G., Deng, D., Wang, J., Feng, J.: PASS-JOIN: A partition-based method for similarity joins. PVLDB 5(3), 253–264 (2011)
Li, G., He, J., Deng, D., Li, J.: Efficient similarity join and search on multi-attribute data. In: SIGMOD, pp. 1137–1151 (2015)
Li, K., Li, X.Z.G., Feng, J.: A rating-ranking based framework for crowdsourced top-k computation. In: SIGMOD, pp. 1–16 (2018)
Lofi, C., Maarry, K.E., Balke, W.: Skyline queries in crowd-enabled databases. In: EDBT, pp. 465–476 (2013)
Lofi, C., Maarry, K.E., Balke, W.: Skyline queries over incomplete data - error models for focused crowd-sourcing. In: ER, pp. 298–312 (2013)
Lotosh, I., Milo, T., Novgorodov, S.: Crowdplanr: Planning made easy with crowd. In: ICDE, pp. 1344–1347. IEEE (2013)
Marcus, A., Karger, D.R., Madden, S., Miller, R., Oh, S.: Counting with the crowd. PVLDB 6(2), 109–120 (2012)
Marcus, A., Wu, E., Karger, D.R., Madden, S., Miller, R.C.: Human-powered sorts and joins. PVLDB 5(1), 13–24 (2011)
de Marneffe, M.C., MacCartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: LREC (2006)
Mozafari, B., Sarkar, P., Franklin, M., Jordan, M., Madden, S.: Scaling up crowd-sourcing to very large datasets: a case for active learning. PVLDB 8(2), 125–136 (2014)
Nau, D., Ghallab, M., Traverso, P.: Automated Planning: Theory & Practice. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2004)
Negahban, S., Oh, S., Shah, D.: Iterative ranking from pair-wise comparisons. In: NIPS, pp. 2483–2491 (2012)
Nguyen, Q.V.H., Nguyen, T.T., Miklós, Z., Aberer, K., Gal, A., Weidlich, M.: Pay-as-you-go reconciliation in schema matching networks. In: ICDE, pp. 220–231. IEEE (2014)
Parameswaran, A.G., Boyd, S., Garcia-Molina, H., Gupta, A., Polyzotis, N., Widom, J.: Optimal crowd-powered rating and filtering algorithms. PVLDB 7(9), 685–696 (2014)
Parameswaran, A.G., Garcia-Molina, H., Park, H., Polyzotis, N., Ramesh, A., Widom, J.: Crowdscreen: algorithms for filtering data with humans. In: SIGMOD, pp. 361–372 (2012)
Parameswaran, A.G., Sarma, A.D., Garcia-Molina, H., Polyzotis, N., Widom, J.: Human-assisted graph search: it’s okay to ask questions. PVLDB 4(5), 267–278 (2011)
Park, H., Widom, J.: Crowdfill: collecting structured data from the crowd. In: SIGMOD, pp. 577–588 (2014)
Pfeiffer, T., Gao, X.A., Chen, Y., Mao, A., Rand, D.G.: Adaptive polling for information aggregation. In: AAAI (2012)
Pomerol, J.C., Barba-Romero, S.: Multicriterion decision in management: principles and practice, vol. 25. Springer (2000)
Pournajaf, L., Xiong, L., Sunderam, V., Goryczka, S.: Spatial task assignment for crowd sensing with cloaked locations. In: MDM, vol. 1, pp. 73–82. IEEE (2014)
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDBJ 10(4), 334–350 (2001)
Rekatsinas, T., Deshpande, A., Parameswaran, A.G.: Crowdgather: Entity extraction over structured domains. CoRR abs/1502.06823 (2015)
Sarawagi, S., Bhamidipaty, A.: Interactive deduplication using active learning. In: SIGKDD, pp. 269–278 (2002)
Sarma, A.D., Jain, A., Nandi, A., Parameswaran, A., Widom, J.: Jellybean: Crowd-powered image counting algorithms. Technical report, Stanford University
Sarma, A.D., Parameswaran, A.G., Garcia-Molina, H., Halevy, A.Y.: Crowd-powered find algorithms. In: ICDE, pp. 964–975 (2014)
Shang, Z., Liu, Y., Li, G., Feng, J.: K-join: Knowledge-aware similarity join. IEEE Trans. Knowl. Data Eng. 28(12), 3293–3308 (2016)
Su, H., Zheng, K., Huang, J., Jeung, H., Chen, L., Zhou, X.: Crowdplanner: A crowd-based route recommendation system. In: ICDE, pp. 1144–1155. IEEE (2014)
Su, H., Zheng, K., Huang, J., Liu, T., Wang, H., Zhou, X.: A crowd-based route recommendation system-crowdplanner. In: ICDE, pp. 1178–1181 (2014)
Sun, J., Shang, Z., Li, G., Deng, D., Bao, Z.: Dima: A distributed in-memory similarity-based query processing system. PVLDB 10(12), 1925–1928 (2017)
Ta, N., Li, G., Feng, J.: An efficient ride-sharing framework for maximizing shared route. TKDE 32(9), 3001–3015 (2017)
Talamadupula, K., Kambhampati, S., Hu, Y., Nguyen, T.A., Zhuo, H.H.: Herding the crowd: Automated planning for crowdsourced planning. In: HCOMP (2013)
To, H., Ghinita, G., Shahabi, C.: A framework for protecting worker location privacy in spatial crowdsourcing. PVLDB 7(10), 919–930 (2014)
Trushkowsky, B., Kraska, T., Franklin, M.J., Sarkar, P.: Crowdsourced enumeration queries. In: ICDE, pp. 673–684 (2013)
Venetis, P., Garcia-Molina, H., Huang, K., Polyzotis, N.: Max algorithms in crowdsourcing environments. In: WWW, pp. 989–998 (2012)
Verroios, V., Garcia-Molina, H.: Entity resolution with crowd errors. In: ICDE, pp. 219–230 (2015)
Verroios, V., Garcia-Molina, H., Papakonstantinou, Y.: Waldo: An adaptive human interface for crowd entity resolution. In: SIGMOD, pp. 1133–1148. ACM (2017)
Vesdapunt, N., Bellare, K., Dalvi, N.N.: Crowdsourcing algorithms for entity resolution. PVLDB 7(12), 1071–1082 (2014)
Wang, J., Kraska, T., Franklin, M.J., Feng, J.: CrowdER: crowdsourcing entity resolution. PVLDB 5(11), 1483–1494 (2012)
Wang, J., Li, G., Feng, J.: Trie-join: Efficient trie-based string similarity joins with edit-distance constraints. PVLDB 3(1), 1219–1230 (2010)
Wang, J., Li, G., Feng, J.: Fast-join: An efficient method for fuzzy token matching based string similarity join. In: ICDE, pp. 458–469 (2011)
Wang, J., Li, G., Feng, J.: Can we beat the prefix filtering?: an adaptive framework for similarity join and search. In: SIGMOD, pp. 85–96 (2012)
Wang, J., Li, G., Feng, J.: Extending string similarity join to tolerant fuzzy token matching. ACM Trans. Database Syst. 39(1), 7:1–7:45 (2014)
Wang, J., Li, G., Kraska, T., Franklin, M.J., Feng, J.: Leveraging transitive relations for crowdsourced joins. In: SIGMOD, pp. 229–240 (2013)
Wang, S., Xiao, X., Lee, C.: Crowd-based deduplication: An adaptive approach. In: SIGMOD, pp. 1263–1277 (2015)
Wauthier, F.L., Jordan, M.I., Jojic, N.: Efficient ranking from pairwise comparisons. In: ICML, pp. 109–117 (2013)
Weng, X., Li, G., Hu, H., Feng, J.: Crowdsourced selection on multi-attribute data. In: CIKM, pp. 307–316 (2017)
Whang, S.E., Lofgren, P., Garcia-Molina, H.: Question selection for crowd entity resolution. PVLDB 6(6), 349–360 (2013)
Whang, S.E., McAuley, J., Garcia-Molina, H.: Compare me maybe: Crowd entity resolution interfaces. Technical report, Stanford University
Yan, T., Kumar, V., Ganesan, D.: Crowdsearch: exploiting crowds for accurate real-time image search on mobile phones. In: MobiSys, pp. 77–90 (2010)
Ye, P., EDU, U., Doermann, D.: Combining preference and absolute judgements in a crowd-sourced setting. In: ICML Workshop (2013)
Yi, J., Jin, R., Jain, A.K., Jain, S., Yang, T.: Semi-crowdsourced clustering: Generalizing crowd labeling by robust distance metric learning. In: NIPS, pp. 1781–1789 (2012)
Yu, M., Li, G., Deng, D., Feng, J.: String similarity search and join: a survey. Frontiers of Computer Science 10(3), 399–417 (2016)
Yu, M., Wang, J., Li, G., Zhang, Y., Deng, D., Feng, J.: A unified framework for string similarity search with edit-distance constraint. VLDB J. 26(2), 249–274 (2017)
Zhang, C.J., Chen, L., Jagadish, H.V., Cao, C.C.: Reducing uncertainty of schema matching via crowdsourcing. PVLDB 6(9), 757–768 (2013)
Zhang, C.J., Tong, Y., Chen, L.: Where to: Crowd-aided path selection. PVLDB 7(14), 2005–2016 (2014)
Zhang, X., Li, G., Feng, J.: Crowdsourced top-k algorithms: An experimental evaluation. PVLDB 9(4) (2015)
Zhuang, Y., Li, G., Zhong, Z., Feng, J.: Hike: A hybrid human-machine method for entity alignment in large-scale knowledge bases. In: CIKM, pp. 1917–1926 (2017)
Zhuo, H.H.: Crowdsourced action-model acquisition for planning. In: AAAI, pp. 3439–3446
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Li, G., Wang, J., Zheng, Y., Fan, J., Franklin, M.J. (2018). Crowdsourced Operators. In: Crowdsourced Data Management. Springer, Singapore. https://doi.org/10.1007/978-981-10-7847-7_7
Download citation
DOI: https://doi.org/10.1007/978-981-10-7847-7_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7846-0
Online ISBN: 978-981-10-7847-7
eBook Packages: Computer ScienceComputer Science (R0)