Assessing the Quality of Spatio-Textual Datasets in the Absence of Ground Truth

  • Mouzhi GeEmail author
  • Theodoros ChondrogiannisEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 767)


The increasing availability of enriched geospatial data has opened up a new domain and enables the development of more sophisticated location-based services and applications. However, this development has also given rise to various data quality problems as it is very hard to verify the data for all real-world entities contained in a dataset. In this paper, we propose ARCI, a relative quality indicator which exploits the vast availability of spatio-textual datasets, to indicate how confident a user can be in the correctness of a given dataset. ARCI operates in the absence of ground truth and aims at computing the relative quality of an input dataset by cross-referencing its entries among various similar datasets. We also present an algorithm for computing ARCI and we evaluate its performance in a preliminary experimental evaluation using real-world datasets.


Spatio-textual data Data quality Relative quality 


  1. 1.
    Abedjan, Z., Akcora, C.G., Ouzzani, M., Papotti, P., Stonebraker, M.: Temporal rules discovery for web data cleaning. Proc. VLDB Endowment 9(4), 336–347 (2015)CrossRefGoogle Scholar
  2. 2.
    Ballesteros, J., Cary, A., Rishe, N.: Spsjoin: parallel spatial similarity joins. In: Proceedings of the 19th ACM SIGSPATIAL GIS Conference, pp. 481–484 (2011)Google Scholar
  3. 3.
    Bouros, P., Ge, S., Mamoulis, N.: Spatio-textual similarity joins. Proc. VLDB Endowment 6(1), 1–12 (2012)CrossRefGoogle Scholar
  4. 4.
    Cao, Y., Fan, W., Yu, W.: Determining the relative accuracy of attributes. In: Proceedings of the 2013 ACM SIGMOD Conference, pp. 565–576 (2013)Google Scholar
  5. 5.
    Chiang, F., Miller, R.J.: Discovering data quality rules. Proc. VLDB Endowment 1(1), 1166–1177 (2008)CrossRefGoogle Scholar
  6. 6.
    Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: consistency and accuracy. In: Proceedings of the 33rd VLDB Conference, pp. 315–326 (2007)Google Scholar
  7. 7.
    Galarus, D., Angryk, R.: A smart approach to quality assessment of site-based spatio-temporal data. In: Proceedings of the 24th ACM SIGSPATIAL GIS Conference, pp. 55:1–55:4 (2016)Google Scholar
  8. 8.
    Kondrak, G.: N-gram similarity and distance. In: Consens, M., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 115–126. Springer, Heidelberg (2005). doi: 10.1007/11575832_13 CrossRefGoogle Scholar
  9. 9.
    Levandoski, J.J., Sarwat, M., Eldawy, A., Mokbel, M.F.: Lars: a location-aware recommender system. In: Proceedings of the 28th IEEE ICDE, pp. 450–461 (2012)Google Scholar
  10. 10.
    Levenshtein, V.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys. Doklady 10, 707–710 (1965)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Missier, P., Embury, S., Greenwood, M., Preece, A., Jin, B.: Quality views: capturing and exploiting the user perspective on data quality. In Proceedings of the 32nd VLDB Conference, pp. 977–988 (2006)Google Scholar
  12. 12.
    Rao, J., Lin, J., Samet, H.: Partitioning strategies for spatio-textual similarity join. In: Proceedings of the 3rd ACM International Workshop on Analytics for Big Geospatial Data, pp. 40–49 (2014)Google Scholar
  13. 13.
    Razniewski, S., Nutt, W.: Completeness of queries over incomplete databases. Proc. VLDB Endowment 4(11), 749–760 (2011)Google Scholar
  14. 14.
    Recchia, G., Louwerse, M.: A comparison of string similarity measures for toponym matching. In: Proceedings of The 1st ACM International COMP Workshop, pp. 54:54–54:61 (2013)Google Scholar
  15. 15.
    Tsatsanifos, G., Vlachou, A.: On processing top-k spatio-textual preference queries. In: Proceedings of the 18th EDBT Confernce, pp. 433–444 (2015)Google Scholar
  16. 16.
    Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. J. Manage. Inf. Syst. 12(4), 5–33 (1996)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Masaryk UniversityBrnoCzech Republic
  2. 2.Free University of Bozen-BolzanoSouth TyrolItaly

Personalised recommendations