Abstract
The increasing availability of enriched geospatial data has opened up a new domain and enables the development of more sophisticated location-based services and applications. However, this development has also given rise to various data quality problems as it is very hard to verify the data for all real-world entities contained in a dataset. In this paper, we propose ARCI, a relative quality indicator which exploits the vast availability of spatio-textual datasets, to indicate how confident a user can be in the correctness of a given dataset. ARCI operates in the absence of ground truth and aims at computing the relative quality of an input dataset by cross-referencing its entries among various similar datasets. We also present an algorithm for computing ARCI and we evaluate its performance in a preliminary experimental evaluation using real-world datasets.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Abedjan, Z., Akcora, C.G., Ouzzani, M., Papotti, P., Stonebraker, M.: Temporal rules discovery for web data cleaning. Proc. VLDB Endowment 9(4), 336–347 (2015)
Ballesteros, J., Cary, A., Rishe, N.: Spsjoin: parallel spatial similarity joins. In: Proceedings of the 19th ACM SIGSPATIAL GIS Conference, pp. 481–484 (2011)
Bouros, P., Ge, S., Mamoulis, N.: Spatio-textual similarity joins. Proc. VLDB Endowment 6(1), 1–12 (2012)
Cao, Y., Fan, W., Yu, W.: Determining the relative accuracy of attributes. In: Proceedings of the 2013 ACM SIGMOD Conference, pp. 565–576 (2013)
Chiang, F., Miller, R.J.: Discovering data quality rules. Proc. VLDB Endowment 1(1), 1166–1177 (2008)
Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: consistency and accuracy. In: Proceedings of the 33rd VLDB Conference, pp. 315–326 (2007)
Galarus, D., Angryk, R.: A smart approach to quality assessment of site-based spatio-temporal data. In: Proceedings of the 24th ACM SIGSPATIAL GIS Conference, pp. 55:1–55:4 (2016)
Kondrak, G.: N-gram similarity and distance. In: Consens, M., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 115–126. Springer, Heidelberg (2005). doi:10.1007/11575832_13
Levandoski, J.J., Sarwat, M., Eldawy, A., Mokbel, M.F.: Lars: a location-aware recommender system. In: Proceedings of the 28th IEEE ICDE, pp. 450–461 (2012)
Levenshtein, V.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys. Doklady 10, 707–710 (1965)
Missier, P., Embury, S., Greenwood, M., Preece, A., Jin, B.: Quality views: capturing and exploiting the user perspective on data quality. In Proceedings of the 32nd VLDB Conference, pp. 977–988 (2006)
Rao, J., Lin, J., Samet, H.: Partitioning strategies for spatio-textual similarity join. In: Proceedings of the 3rd ACM International Workshop on Analytics for Big Geospatial Data, pp. 40–49 (2014)
Razniewski, S., Nutt, W.: Completeness of queries over incomplete databases. Proc. VLDB Endowment 4(11), 749–760 (2011)
Recchia, G., Louwerse, M.: A comparison of string similarity measures for toponym matching. In: Proceedings of The 1st ACM International COMP Workshop, pp. 54:54–54:61 (2013)
Tsatsanifos, G., Vlachou, A.: On processing top-k spatio-textual preference queries. In: Proceedings of the 18th EDBT Confernce, pp. 433–444 (2015)
Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. J. Manage. Inf. Syst. 12(4), 5–33 (1996)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Ge, M., Chondrogiannis, T. (2017). Assessing the Quality of Spatio-Textual Datasets in the Absence of Ground Truth. In: Kirikova, M., et al. New Trends in Databases and Information Systems. ADBIS 2017. Communications in Computer and Information Science, vol 767. Springer, Cham. https://doi.org/10.1007/978-3-319-67162-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-67162-8_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67161-1
Online ISBN: 978-3-319-67162-8
eBook Packages: Computer ScienceComputer Science (R0)