Abstract
Set of t-uples expansion refers to the task of building a set of t-uples from a corpus based on some examples, or seed t-uples. Set of t-uples expansion requires a ranking mechanism to select the relevant candidates. We propose to harness and compare the performance of different state-of-the-art truth finding algorithms for the task of set of t-uples expansion. We empirically and comparatively evaluate the accuracy of these different ranking algorithms. We show that truth finding algorithms provide a practical and effective solution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Waguih, D.A., Berti-Équille, L.: Truth discovery algorithms: an experimental evaluation. CoRR, abs/1409.6428, September 2014
Ba, M.L., Berti-Equille, L., Shah, K., Hammady, H.M.: VERA: a platform for veracity estimation over web data. In: WWW (2016)
Ba, M.L., Horincar, R., Senellart, P., Wu, H.: Truth finding with attribute partitioning. In: WebDB SIGMOD Workshop, Melbourne, Australia, May 2015
Berti-Equille, L.: Data veracity estimation with ensembling truth discovery methods. In: IEEE Big Data Workshop (2015)
Berti-Equille, L., Borge-Holthoefer, J.: Veracity of Big Data: From Truth Discovery Computation Algorithms to Models of Misinformation Dynamics. Morgan & Claypool, San Rafael (2015)
Bing, L., Lam, W., Wong, T.L.: Wikipedia entity expansion and attribute extraction from the web using semi-supervised learning. In: WSDM, New York, NY, USA (2013)
Bleiholder, J., Draba, K., Naumann, F.: FuSem: exploring different semantics of data fusion. In: VLDB, Vienna, Austria (2007)
Brin, S.: Extracting patterns and relations from the world wide web. In: WWW and Databases Workshop (1998)
Chen, Z., Cafarella, M., Jagadish, H.V.: Long-tail vocabulary dictionary extraction from the web. In: WSDM, New York, NY, USA (2016)
Dong, X.L., Berti-Equille, L., Srivastava, D.: Integrating conflicting data: the role of source dependence. PVLDB 2(1), 550–561 (2009)
Dong, X.L., Berti-Equille, L., Srivastava, D.: Truth discovery and copying detection in a dynamic world. PVLDB 2(1), 562–573 (2009)
Dong, X.L., Naumann, F.: Data fusion: resolving data conflicts for integration. PVLDB 2(2), 1654–1655 (2009)
Er, N.A.S., Abdessalem, T., Bressan, S.: Set of t-uples expansion by example. In: iiWAS, New York, NY, USA (2016)
Er, N.A.S., Ba, M.L., Abdessalem, T., Bressan, S.: Truthfulness of candidates in set of t-uples expansion. Technical report, National University of Singapore, School of Computing, TRA5/17, May 2017
Galland, A., Abiteboul, S., Marian, A., Senellart, P.: Corroborating information from disagreeing views. In: WSDM, New York, USA, February 2010
He, Y., Xin, D.: SEISA: set expansion by iterative similarity aggregation. In: WWW, New York, NY, USA (2011)
Li, Q., Li, Y., Gao, J., Zhao, B., Fan, W., Han, J.: Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In: SIGMOD, Snowbird, Utah, USA, May 2014
Li, X., Dong, X.L., Lyons, K., Meng, W., Srivastava, D.: Truth finding on the deep web: is the problem solved? PVLDB 6(2), 97–108 (2012)
Moens, M., Li, J., Chua, T. (eds.): Mining User Generated Content. Chapman and Hall/CRC, Boca Raton (2014)
Paşca, M.: Weakly-supervised discovery of named entities using web search queries. In: CIKM, New York, NY, USA (2007)
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web. Technical report (1999)
Pasternack, J., Roth, D.: Latent credibility analysis. In: WWW, Rio de Janeiro, Brazil, May 2013
Pochampally, R., Das Sarma, A., Dong, X.L., Meliou, A., Srivastava, D.: Fusing data with correlations. In: SIGMOD, Snowbird, Utah, USA, May 2014
Sarker, A., Gonzalez, G.: Portable automatic text classification for adverse drug reaction detection via multi-corpus training. J. Biomed. Inform. 53, 196–207 (2015)
Waguih, D.A., Goel, N., Hammady, H.M., Berti-Equille, L.: AllegatorTrack: combining and reporting results of truth discovery from multi-source data. In: ICDE, Seoul, Korea (2015)
Wang, D., Kaplan, L., Le, H., Abdelzaher, T.: On truth discovery in social sensing: a maximum likelihood estimation approach. In: IPSN, Beijing, China, April 2012
Wang, R.C., Cohen, W.W.: Language-independent set expansion of named entities using the web. In: ICDM (2007)
Wang, R.C., Cohen, W.W.: Character-level analysis of semi-structured documents for set expansion. In: EMNPL, Stroudsburg, PA, USA (2009)
Wang, R.C., Schlaefer, N., Cohen, W.W., Nyberg, E.: Automatic set expansion for list question answering. In: EMNLP, Stroudsburg, PA, USA (2008)
Yin, X., Han, J., Yu, P.S.: Truth discovery with multiple conflicting information providers on the web. In: IEEE TKDE, June 2008
Zhang, Z., Sun, L., Han, X.: A joint model for entity set expansion and attribute extraction from web search queries. In: AAAI (2016)
Zhao, B., Rubinstein, B.I.P., Gemmell, J., Han, J.: A Bayesian approach to discovering truth from conflicting sources for data integration. PVLDB 5(6), 550–561 (2012)
Zhao, Z., Cheng, J., Ng, W.: Truth discovery in data streams: a single-pass probabilistic approach. In: CIKM, Shangai, China, November 2014
Acknowledgment
This work has been partially funded by the Big Data and Market Insights Chair of Télécom ParisTech and supported by the National University of Singapore under a grant from Singapore Ministry of Education for research project number T1 251RES1607.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Er, N.A.S., Ba, M.L., Abdessalem, T., Bressan, S. (2017). Truthfulness of Candidates in Set of t-uples Expansion. In: Benslimane, D., Damiani, E., Grosky, W., Hameurlain, A., Sheth, A., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2017. Lecture Notes in Computer Science(), vol 10438. Springer, Cham. https://doi.org/10.1007/978-3-319-64468-4_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-64468-4_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64467-7
Online ISBN: 978-3-319-64468-4
eBook Packages: Computer ScienceComputer Science (R0)