Skip to main content

Assessing the Quality of Spatio-Textual Datasets in the Absence of Ground Truth

  • Conference paper
  • First Online:
  • 1013 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 767))

Abstract

The increasing availability of enriched geospatial data has opened up a new domain and enables the development of more sophisticated location-based services and applications. However, this development has also given rise to various data quality problems as it is very hard to verify the data for all real-world entities contained in a dataset. In this paper, we propose ARCI, a relative quality indicator which exploits the vast availability of spatio-textual datasets, to indicate how confident a user can be in the correctness of a given dataset. ARCI operates in the absence of ground truth and aims at computing the relative quality of an input dataset by cross-referencing its entries among various similar datasets. We also present an algorithm for computing ARCI and we evaluate its performance in a preliminary experimental evaluation using real-world datasets.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://tour-pedia.org/about/datasets.html.

  2. 2.

    https://developer.foursquare.com.

References

  1. Abedjan, Z., Akcora, C.G., Ouzzani, M., Papotti, P., Stonebraker, M.: Temporal rules discovery for web data cleaning. Proc. VLDB Endowment 9(4), 336–347 (2015)

    Article  Google Scholar 

  2. Ballesteros, J., Cary, A., Rishe, N.: Spsjoin: parallel spatial similarity joins. In: Proceedings of the 19th ACM SIGSPATIAL GIS Conference, pp. 481–484 (2011)

    Google Scholar 

  3. Bouros, P., Ge, S., Mamoulis, N.: Spatio-textual similarity joins. Proc. VLDB Endowment 6(1), 1–12 (2012)

    Article  Google Scholar 

  4. Cao, Y., Fan, W., Yu, W.: Determining the relative accuracy of attributes. In: Proceedings of the 2013 ACM SIGMOD Conference, pp. 565–576 (2013)

    Google Scholar 

  5. Chiang, F., Miller, R.J.: Discovering data quality rules. Proc. VLDB Endowment 1(1), 1166–1177 (2008)

    Article  Google Scholar 

  6. Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: consistency and accuracy. In: Proceedings of the 33rd VLDB Conference, pp. 315–326 (2007)

    Google Scholar 

  7. Galarus, D., Angryk, R.: A smart approach to quality assessment of site-based spatio-temporal data. In: Proceedings of the 24th ACM SIGSPATIAL GIS Conference, pp. 55:1–55:4 (2016)

    Google Scholar 

  8. Kondrak, G.: N-gram similarity and distance. In: Consens, M., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 115–126. Springer, Heidelberg (2005). doi:10.1007/11575832_13

    Chapter  Google Scholar 

  9. Levandoski, J.J., Sarwat, M., Eldawy, A., Mokbel, M.F.: Lars: a location-aware recommender system. In: Proceedings of the 28th IEEE ICDE, pp. 450–461 (2012)

    Google Scholar 

  10. Levenshtein, V.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys. Doklady 10, 707–710 (1965)

    MathSciNet  MATH  Google Scholar 

  11. Missier, P., Embury, S., Greenwood, M., Preece, A., Jin, B.: Quality views: capturing and exploiting the user perspective on data quality. In Proceedings of the 32nd VLDB Conference, pp. 977–988 (2006)

    Google Scholar 

  12. Rao, J., Lin, J., Samet, H.: Partitioning strategies for spatio-textual similarity join. In: Proceedings of the 3rd ACM International Workshop on Analytics for Big Geospatial Data, pp. 40–49 (2014)

    Google Scholar 

  13. Razniewski, S., Nutt, W.: Completeness of queries over incomplete databases. Proc. VLDB Endowment 4(11), 749–760 (2011)

    Google Scholar 

  14. Recchia, G., Louwerse, M.: A comparison of string similarity measures for toponym matching. In: Proceedings of The 1st ACM International COMP Workshop, pp. 54:54–54:61 (2013)

    Google Scholar 

  15. Tsatsanifos, G., Vlachou, A.: On processing top-k spatio-textual preference queries. In: Proceedings of the 18th EDBT Confernce, pp. 433–444 (2015)

    Google Scholar 

  16. Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. J. Manage. Inf. Syst. 12(4), 5–33 (1996)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Mouzhi Ge or Theodoros Chondrogiannis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Ge, M., Chondrogiannis, T. (2017). Assessing the Quality of Spatio-Textual Datasets in the Absence of Ground Truth. In: Kirikova, M., et al. New Trends in Databases and Information Systems. ADBIS 2017. Communications in Computer and Information Science, vol 767. Springer, Cham. https://doi.org/10.1007/978-3-319-67162-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67162-8_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67161-1

  • Online ISBN: 978-3-319-67162-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics