World Wide Web

, Volume 22, Issue 4, pp 1523–1553 | Cite as

Cleaning uncertain graphs via noisy crowdsourcing

  • Yongcheng Wu
  • Xin Lin
  • Yan YangEmail author
  • Liang He
Part of the following topical collections:
  1. Special Issue on Web and Big Data


Uncertain graph is an important data model for many real-world applications. To answer the query on the uncertain graphs, the edges in these graphs are associated with existential probabilities that represent the likelihood of the existence of the edge. Almost all works on this area focus on how to promote the efficiency of the query processing. However, another issue should be notable, that is, the query results from the uncertain graphs are sometimes uninformative due to the edge uncertainty. We adopt a crowdsourcing-based approach to make the query results more informative. To save the monetary and time cost of crowdsourcing, we should select the optimal edges to clean to maximize the quality improvement. However, the noise of the crowdsourcing results will make the problem more complex. We prove that the problem is #P-hard and propose an efficient algorithm to derive the optimal edge. Our experimental results show that our proposed algorithm outperforms random-selection up to 22 times in quality improvement and each-edge-comparison way up to 5 times fast in elapsed time, which proves this algorithm is both effective and efficient.


Social network Uncertain graph Noisy crowdsourcing Graph cleaning Reachability computation 



This research is funded by NSFC (No. 61773167) and the Natural Science Foundation of Shanghai (No.17ZR1444900).


  1. 1.
    Aggarwal, C.C.: Managing and mining uncertain data. Springer, US (2009)CrossRefzbMATHGoogle Scholar
  2. 2.
    Ball, M.O.: Computational complexity of network reliability analysis: an overview. IEEE Trans. Reliab. 35(3), 230–239 (1986)CrossRefzbMATHGoogle Scholar
  3. 3.
    Brabham, D.C.: Crowdsourcing as a model for problem solving: an introduction andcases. Convergence the International Journal of Research Into New Media Technologies 14(1), 75–90 (2008)CrossRefGoogle Scholar
  4. 4.
    Chen, M., Gu, Y., Bao, Y., Yu, G.: Label and distance-constraint reachability queries in uncertain graphs. In: Database Systems for Advanced Applications, pp 188–202. Springer International Publishing, Cham (2014)Google Scholar
  5. 5.
    Cheng, J., Huang, S., Wu, H., Fu, W.C.: Tf-label:a topological-folding labeling scheme for reachability querying in a large graph. In: ACM SIGMOD International Conference on Management of Data, pp. 193–204 (2013)Google Scholar
  6. 6.
    Cheng, R.: Querying and cleaning uncertain data. Springer, Berlin (2009)CrossRefGoogle Scholar
  7. 7.
    Cheng, R., Chen, J., Xie, X.: Cleaning uncertain data with quality guarantees. Proceedings of the Vldb Endowment 1(1), 722–735 (2008)CrossRefGoogle Scholar
  8. 8.
    Doan, A.H., Ramakrishnan, R., Halevy, A.Y.: Crowdsourcing systems on the world-wide Web. Commun. ACM 54(4), 86–96 (2011)CrossRefGoogle Scholar
  9. 9.
    Fishman, G.S.: A comparison of four monte carlo methods for estimating the probability of s-t connectedness. IEEE Trans. Reliab. 35(2), 145–155 (1986)CrossRefzbMATHGoogle Scholar
  10. 10.
    Jin, R., Hong, H., Wang, H., Ning, R., Xiang, Y.: Computing label-constraint reachability in graph databases. In: ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, Usa, June, pp. 123?-134 (2010)Google Scholar
  11. 11.
    Jin, R., Liu, L., Ding, B., Wang, H.: Distance-constraint reachability computation in uncertain graphs. Very Large Data Bases 4(9), 551–562 (2011)Google Scholar
  12. 12.
    Jin, R., Liu, L., Ding, B., Wang, H.: Distance-constraint reachability computation in uncertain graphs. Proceedings of the Vldb Endowment 4(9), 551–562 (2011)CrossRefGoogle Scholar
  13. 13.
    Karp, R.M., Luby, M.G.: A new monte-carlo method for estimating the failure probability of an (1983)Google Scholar
  14. 14.
    Khan, A., Chen, L.: On uncertain graphs modeling and queries. VLDB Endowment (2015)Google Scholar
  15. 15.
    Krogan, N.J., Cagney, G., Yu, H., Zhong, G., Guo, X., Ignatchenko, A., Li, J., Pu, S., Datta, N., Tikuisis, A.P.: Global landscape of protein complexes in the yeast saccharomyces cerevisiae. Nature 440(7084), 637–43 (2006)CrossRefGoogle Scholar
  16. 16.
    Lin, X., Xu, J., Hu, H.: Range-based skyline queries in mobile environments. IEEE Trans. Knowl. Data Eng. 25(4), 835–849 (2013)CrossRefGoogle Scholar
  17. 17.
    Lin, X., Peng, Y., Choi, B., Xu, J.: Human-powered data cleaning for probabilistic reachability queries on uncertain graphs. IEEE Trans. Knowl. Data Eng. 29(7), 1452–1465 (2017)CrossRefGoogle Scholar
  18. 18.
    Marcus, A., Wu, E., Karger, D., Madden, S., Miller, R.: Human-powered sorts and joins. Proceedings of the Vldb Endowment 5(1), 13–24 (2011)CrossRefGoogle Scholar
  19. 19.
    Mo, L., Cheng, R., Li, X., Cheung, D.W.: Cleaning uncertain data for top-k queries. In: IEEE International Conference on Data Engineering, pp. 134–145 (2013)Google Scholar
  20. 20.
    Niedermayer, J., Emrich, T., Renz, M., Mamoulis, N., Chen, L., Kriegel, H.P.: Probabilistic nearest neighbor queries on uncertain moving object trajectories. Proceedings of the Vldb Endowment 7(3), 205–216 (2013)CrossRefGoogle Scholar
  21. 21.
    Papadias, D., Tao, Y., Fu, G., Seeger, B.: Progressive skyline computation in database systems. ACM Trans. Database Syst. 30(1), 41–82 (2005)CrossRefGoogle Scholar
  22. 22.
    Ruomingjin Linliu, B.H.: Distanceconstraintreachabilitycomputationin. Pvldb 4 (9), 2011 (2012)Google Scholar
  23. 23.
    Solecki, B., Solecki, B., Solecki, B.: Kdd cup 2013 - author-paper identification challenge: second place team. In: Kdd Cup 2013 Workshop, pp. 3 (2013)Google Scholar
  24. 24.
    Soliman, M.A., Ilyas, I.F., Chang, C.C.: Top-k query processing in uncertain databases. In: IEEE International Conference on Data Engineering, pp. 896–905 (2007)Google Scholar
  25. 25.
    Tao, Y., Xiao, X., Pei, J.: Efficient skyline and top-k retrieval in subspaces. IEEE Trans. Knowl. Data Eng. 19(8), 1072–1088 (2007)CrossRefGoogle Scholar
  26. 26.
    Tong, Y., Chen, L., Cheng, Y., Yu, P.S.: Mining frequent itemsets over uncertain databases. Proceedings of the Vldb Endowment 5(11), 1650–1661 (2012)CrossRefGoogle Scholar
  27. 27.
    Tong, Y., Chen, L., Ding, B.: Discovering threshold-based frequent closed itemsets over probabilistic data. In: IEEE International Conference on Data Engineering, pp. 270–281 (2012)Google Scholar
  28. 28.
    Tong, Y., Cao, C.C., Zhang, C.J., Li, Y.: Crowdcleaner: Data cleaning for multi-version data on the Web via crowdsourcing. In: IEEE International Conference on Data Engineering, pp. 1182–1185 (2014)Google Scholar
  29. 29.
    Verroios, V., Garcia-Molina, H.: Entity resolution with crowd errors. In: IEEE International Conference on Data Engineering, pp. 219–230 (2015)Google Scholar
  30. 30.
    Wang, J., Li, G., Kraska, T., Franklin, M.J., Feng, J.: Leveraging transitive relations for crowdsourced joins. In: ACM SIGMOD International Conference on Management of Data, pp. 229–240 (2013)Google Scholar
  31. 31.
    Widom, J., Agrawal, A.P., Benjelloun, O., Ch, A., Chaumond, J., Murthy, R., Mutsuzaki, M., Sugihara, T., Theobald, M.: Chapter 5 trio: A system for data, uncertainty, and lineage (2013)Google Scholar
  32. 32.
    Xu, K., Zou, L., Yu, J.X., Chen, L., Xiao, Y., Zhao, D.: Answering Label-Constraint Reachability in Large Graphs. In: ACM Conference on Information and Knowledge Management, CIKM 2011, Glasgow, United Kingdom, October, pp. 1595?-1600 (2011)Google Scholar
  33. 33.
    Zhang, C.J., Chen, L., Jagadish, H.V., Cao, C.C.: Reducing uncertainty of schema matching via crowdsourcing. Proceedings of the Vldb Endowment 6(9), 757–768 (2013)CrossRefGoogle Scholar
  34. 34.
    Zhang, C.J., Chen, L., Tong, Y., Liu, Z.: Cleaning uncertain data with a noisy crowd. In: IEEE International Conference on Data Engineering, pp 6–17 (2015)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.East China Normal UniversityShanghaiChina

Personalised recommendations