Uncertainty in Crowd Data Sourcing Under Structural Constraints

Amarilli, Antoine; Amsterdamer, Yael; Milo, Tova

doi:10.1007/978-3-662-43984-5_27

Uncertainty in Crowd Data Sourcing Under Structural Constraints

Antoine Amarilli²¹,
Yael Amsterdamer²² &
Tova Milo²²

Conference paper
First Online: 01 January 2014

1013 Accesses
7 Citations
1 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8505))

Abstract

Applications extracting data from crowdsourcing platforms must deal with the uncertainty of crowd answers in two different ways: first, by deriving estimates of the correct value from the answers; second, by choosing crowd questions whose answers are expected to minimize this uncertainty relative to the overall data collection goal. Such problems are already challenging when we assume that questions are unrelated and answers are independent, but they are even more complicated when we assume that the unknown values follow hard structural constraints (such as monotonicity).

In this vision paper, we examine how to formally address this issue with an approach inspired by [2]. We describe a generalized setting where we model constraints as linear inequalities, and use them to guide the choice of crowd questions and the processing of answers. We present the main challenges arising in this setting, and propose directions to solve them.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
This assumption holds when we are interested in the average crowd answer, e.g., the average rating for a compression quality; and in the many cases where the errors of worker answers tend to cancel out so that the average is close to the truth [2].
2.
Note that likelihood cannot, however, be seen as a probability distribution on \(\varTheta \).
3.
Note that this also changes the way of fitting distributions when computing the error decrease under possible additional samples.

References

Amarilli, A., Amsterdamer, Y., Milo, T.: On the complexity of mining itemsets from the crowd using taxonomies. In: Proceedings of ICDT (to appear), Athens (2014)
Google Scholar
Amsterdamer, Y., Grossman, Y., Milo, T., Senellart, P.: Crowd mining. In: Proceedings of SIGMOD, New York, USA, pp. 241–252 (2013)
Google Scholar
David, H.A., Nagaraja, H.N.: Order Statistics, Chapter 2, p. 14. Wiley, New York (2013)
Google Scholar
Karp, R.M., Kleinberg, R.: Noisy binary search and its applications. In: Proceedings of 18th ACM-SIAM Symposium on Discrete Algorithms (2007)
Google Scholar
Kozlov, M.K., Tarasov, S.P., Khachiyan, L.G.: The polynomial solvability of convex quadratic programming. USSR Comp. Math. Math. Phys. 20(5), 223–228 (1980)
Article MATH MathSciNet Google Scholar
Amazon Mechanical Turk. https://www.mturk.com/
Parameswaran, A., Garcia-Molina, H., Park, H., Polyzotis, N., Ramesh, A., Widom, J.: Crowdscreen: Algorithms for filtering data with humans. In: Proceedings of SIGMOD (2012)
Google Scholar
Parameswaran, A., Sarma, A., Garcia-Molina, H., Polyzotis, N., Widom, J.: Human-assisted graph search: it’s okay to ask questions. Proc. VLDB 4(5), 267–278 (2011)
Article Google Scholar
Parkes, D.C., Ungar, L.H.: Iterative combinatorial auctions: theory and practice. In: Proceedings of AAAI/IAAI (2000)
Google Scholar
Triantaphyllou, E.: Data Mining and Knowledge Discovery by Logic-Based Methods, Chapter 10. Springer, New York (2010)
Book Google Scholar
Trushkowsky, B., Kraska, T., Franklin, M.J., Sarkar, P.: Crowdsourced enumeration queries. In: Proceedings of ICDE (2013)
Google Scholar
Yang, X., Cheng, R., Mo, L., Kao, B., Cheung, D.: On incentive-based tagging. In: Proceedings of ICDE (2013)
Google Scholar

Download references

Acknowledgements

This work has been partially funded by the European Research Council under the FP7, ERC grant MoDaS, agreement 291071, and by the Israel Ministry of Science.

Author information

Authors and Affiliations

Institut Mines–Télécom; Télécom ParisTech; CNRS LTCI, Paris, France
Antoine Amarilli
Tel Aviv University, Tel Aviv, Israel
Yael Amsterdamer & Tova Milo

Authors

Antoine Amarilli
View author publications
You can also search for this author in PubMed Google Scholar
Yael Amsterdamer
View author publications
You can also search for this author in PubMed Google Scholar
Tova Milo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antoine Amarilli .

Editor information

Editors and Affiliations

Pohang University of Science and Technology (POSTECH), Pohang, Korea, Republic of (South Korea)
Wook-Shin Han
National University of Singapore, Singapore, Singapore
Mong Li Lee
Udayana University, Badung, Indonesia
Agus Muliantara
Udayana University, Badung, Indonesia
Ngurah Agus Sanjaya
Christian-Albrechts-Universität zu Kiel Institut für Informatik, Kiel, Germany
Bernhard Thalheim
Fudan University, Shanghai, China
Shuigeng Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Amarilli, A., Amsterdamer, Y., Milo, T. (2014). Uncertainty in Crowd Data Sourcing Under Structural Constraints. In: Han, WS., Lee, M., Muliantara, A., Sanjaya, N., Thalheim, B., Zhou, S. (eds) Database Systems for Advanced Applications. DASFAA 2014. Lecture Notes in Computer Science(), vol 8505. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43984-5_27

Download citation

DOI: https://doi.org/10.1007/978-3-662-43984-5_27
Published: 11 July 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-43983-8
Online ISBN: 978-3-662-43984-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics