Skip to main content

Query Similarity for Approximate Query Answering

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9828))

Abstract

Query rewriting in heterogeneous environments assumes mappings that are complete. In reality and especially in the Big Data era it is rarely the case that such complete sets of mappings exist between sources, and the presence of partial mappings is the norm rather than the exception. So, practically, existing rewriting algorithms fail in the majority of cases. The solution is to approximate original queries with others that can be answered by existing mappings. Approximate queries bear some similarity to original ones in terms of structure and semantics. In this paper we investigate the notion of such query similarity and we introduce the use of query similarity functions to this end. We also present a methodology for the construction of such functions. We employ exemplary similarity functions created with the proposed methodology into recent algorithms for approximate query answering and show experimental results for the influence of the similarity function to the efficiency of the algorithms.

This research is funded from the EU FP7 project ASAP, under Grant Agreement \(n^o\) 619706.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The problem of storing approximate answers to the database on which \(Q_{orig}\) is posed, is related to database versioning and is out of the scope of this paper.

  2. 2.

    Intuitively, \(Q_{apprx}\) deviates from \(Q_{orig}\) only by additional constraints and not by additional ’select’ attributes.

  3. 3.

    Query elements that correspond to ’select’ attributes have a 1-1- correspondence in both \(Q_{orig}^{SQL}\) and \(Q_{orig}^{conj}\). In the conjunctive form, these are called distinguished variables.

  4. 4.

    Actually, this is guaranteed by the classical query rewriting methodology, which creates contained rewritten versions.

References

  1. Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data exchange: semantics and query answering. Th. Comput. Sci. 336 (1)

    Google Scholar 

  2. Lenzerini, M.: Data Integration: a theoretical perspective. In: PODS (2002)

    Google Scholar 

  3. Rodríguez-Gianolli, P., Kementsietsidis, A., Garzetti, M., Kiringa, I., Jiang, L., Masud, M., Miller, R.J., Mylopoulos, J.: Data sharing in the hyperion peer database system. In: VLDB (2005)

    Google Scholar 

  4. Duschka, O.M., Genesereth, M.R.: Answering recursive queries using views. In: PODS (1997)

    Google Scholar 

  5. Levy, A.Y., Rajaraman, A., Ordille, J.O.: Query-answering algorithms for information agents. In: 13th International Conference on Artificila Intelligence (1996)

    Google Scholar 

  6. Pottinger, R., Levy, A.: A scalable algorithm for answering queries using views. In: VLDB (2000)

    Google Scholar 

  7. Kantere, V., Orfanoudakis, G., Kementsietsidis, A., Sellis, T.: Query relaxation across heterogeneous data sources. In: ACM CIKM 2015, pp. 473–482

    Google Scholar 

  8. Batista, G., Monard, M.C.: A study of k-nearest neighbour as an imputation method. In: HIS (2002)

    Google Scholar 

  9. Poosala, V., Ganti, V.: Fast approximate query answering using precomputed statistics. In: ICDE, p. 252 (1999)

    Google Scholar 

  10. Agrawal, S., Chaudhuri, S., Das, G.: DBXplorer: a system for keyword-based search over relational databases. In: ICDE (2002)

    Google Scholar 

  11. Cohen, W.: Integration of heterogeneous databases without common domains using queries based on textual similarity. In: SIGMOD (1998)

    Google Scholar 

  12. Motro, A.: VAGUE: A user interface to relational databases that permis vague queries. TOIS 6(3), 187–214 (1988)

    Article  Google Scholar 

  13. Fuhr, N.: A probabilistic framework for vague queries and imprecise information in databases. In: VLDB (1990)

    Google Scholar 

  14. Kiebling, W., Kostner, G.: Preference SQL - design, implementation, experiences. In: VLDB (2002)

    Google Scholar 

  15. Agrawal, S., Chaudhuri, S., Das, G., Gionis, A.: Automated ranking of database query results. In: CIDR (2003)

    Google Scholar 

  16. Ghosh, A., Parikh, J., Sengar, V.S., Haritsa, J.R.: Plan selection based on query clustering. In: Intelligent Information Integration (1999)

    Google Scholar 

  17. Chu, W.W., Zhang, G.: Associative query answering via query feature similarity. In: IIS (1997)

    Google Scholar 

  18. Potti, N., Patel, J.M.: Daq: a new paradigm for approximate query processing. In: VLDB, vol. 8

    Google Scholar 

  19. Fan, W., Geerts, F., Libkin, L.: On scale independence for querying big data. In: ACM PODS, pp. 51–62 (2014)

    Google Scholar 

  20. Cao, Y., Fan, W., Wo, T., Yu, W.: Bounded conjunctive queries. PVLDB 7(12), 1231–1242 (2014)

    Google Scholar 

  21. Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K.C., Suel, T.: Optimal histograms with quality guarantees. In: VLDB, pp. 275–286 (1998)

    Google Scholar 

  22. Garofalakis, M.N., Gibbons, P.B.: Wavelet synopses with error guarantees. In: ACM SIGMOD, pp. 476–487 (2002)

    Google Scholar 

  23. Agarwal, S., Milner, H., Kleiner, A., Talwalkar, A., Jordan, M.I., Madden, S., Mozafari, B., Stoica, I.: Knowing when you’re wrong: building fast and reliable approximate query processing systems. In: ACM SIGMOD, pp. 481–492 (2014)

    Google Scholar 

  24. Agarwal, S., Mozafari, B., Panda, A., Milner, H., Madden, S., Stoica, I.: Blinkdb: queries with bounded errors and bounded response times on very large data. In: EuroSys, pp. 29–42 (2013)

    Google Scholar 

  25. Chaudhuri, S., Kolaitis, P.G.: Can datalog be approximated? J. Comput. Syst. Sci. 55(2), 355–369 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  26. Barceló, P., Libkin, L., Romero, M.: Efficient approximations of conjunctive queries. SIAM J. Comp. 43(3), 1085–1130 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  27. Fink, R., Olteanu, D.: On the optimal approximation of queries using tractable propositional languages. In: ICDT (2011)

    Google Scholar 

  28. Fan, W., Li, J., Ma, S., Tang, N., Wu, Y., Wu, Y.: Graph pattern matching: From intractable to polynomial time. PVLDB 3(1), 264–275 (2010)

    Google Scholar 

  29. Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: a versatile graph matching algorithm and its application to schema mathcing. In: ICDE (2002)

    Google Scholar 

  30. Doan, A., Madhavan, J., Dhamankar, R., Domingos, P., Halevy, A.: Learning to match ontologies on the semantic web. VLDB J. 12(4), 303–319 (2003)

    Article  Google Scholar 

  31. Kantere, V., Tsoumakos, D., Sellis, T., Roussopoulos, N.: GrouPeer: dynamic clustering of P2P databases. In: Information Systems (2008). doi:10.1016/j.is.2008.04.002

    Google Scholar 

  32. Lín, V., Vassalos, V., Malakasiotis, P.: Minicount: Efficient rewriting of count-queries using views. In: ICDE, p. 1 (2006)

    Google Scholar 

  33. Steinbrunn, M., Moerkotte, G., Kemper, A.: Heuristic and randomized optimization for the join ordering problem. VLDB J. 6(3), 191–208 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Verena Kantere .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Kantere, V. (2016). Query Similarity for Approximate Query Answering. In: Hartmann, S., Ma, H. (eds) Database and Expert Systems Applications. DEXA 2016. Lecture Notes in Computer Science(), vol 9828. Springer, Cham. https://doi.org/10.1007/978-3-319-44406-2_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44406-2_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44405-5

  • Online ISBN: 978-3-319-44406-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics