Abstract
Join operation is usually hard to achieve high quality with machine alone. We adopt crowdsourcing to improve the quality of join. Depending on the number of generated pairs, the overall cost can be expensive for hiring workers to do the verification. We propose a hybrid approach to generate pairs by leveraging attributes, which combines category, sorting and clustering techniques, called CSCER. We also propose an adaptive attribute-selection strategy to efficiently generate pairs based on attributes. Experiments on a real crowdsourcing platform using real datasets indicate that our approaches save the overall cost compared to existing methods and achieve high quality of join results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Marcus, A., Wu, E., Karger, D.R., Madden, S., Miller, R.C.: Human-powered sorts and joins. PVLDB 5(1), 13–24 (2011)
Wang, J., Kraska, T., Franklin, M.J., Feng, J.: Crowder: Crowdsourcing entity resolution. Proceedings of the VLDB Endowment 5(11), 1483–1494 (2012)
Wang, J., Li, G., Kraska, T., Franklin, M.J., Feng, J.: Leveraging transitive relations for crowdsourced joins. In: SIGMOD Conference, pp. 229–240 (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Feng, J., Feng, J., Hu, H. (2014). Leveraging Attributes and Crowdsourcing for Join. In: Li, F., Li, G., Hwang, Sw., Yao, B., Zhang, Z. (eds) Web-Age Information Management. WAIM 2014. Lecture Notes in Computer Science, vol 8485. Springer, Cham. https://doi.org/10.1007/978-3-319-08010-9_47
Download citation
DOI: https://doi.org/10.1007/978-3-319-08010-9_47
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08009-3
Online ISBN: 978-3-319-08010-9
eBook Packages: Computer ScienceComputer Science (R0)