Query Similarity for Approximate Query Answering

Kantere, Verena

doi:10.1007/978-3-319-44406-2_29

Query Similarity for Approximate Query Answering

Verena Kantere¹⁵

Conference paper
First Online: 06 August 2016

986 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9828))

Abstract

Query rewriting in heterogeneous environments assumes mappings that are complete. In reality and especially in the Big Data era it is rarely the case that such complete sets of mappings exist between sources, and the presence of partial mappings is the norm rather than the exception. So, practically, existing rewriting algorithms fail in the majority of cases. The solution is to approximate original queries with others that can be answered by existing mappings. Approximate queries bear some similarity to original ones in terms of structure and semantics. In this paper we investigate the notion of such query similarity and we introduce the use of query similarity functions to this end. We also present a methodology for the construction of such functions. We employ exemplary similarity functions created with the proposed methodology into recent algorithms for approximate query answering and show experimental results for the influence of the similarity function to the efficiency of the algorithms.

This research is funded from the EU FP7 project ASAP, under Grant Agreement \(n^o\) 619706.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The problem of storing approximate answers to the database on which \(Q_{orig}\) is posed, is related to database versioning and is out of the scope of this paper.
2.
Intuitively, \(Q_{apprx}\) deviates from \(Q_{orig}\) only by additional constraints and not by additional ’select’ attributes.
3.
Query elements that correspond to ’select’ attributes have a 1-1- correspondence in both \(Q_{orig}^{SQL}\) and \(Q_{orig}^{conj}\). In the conjunctive form, these are called distinguished variables.
4.
Actually, this is guaranteed by the classical query rewriting methodology, which creates contained rewritten versions.

References

Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data exchange: semantics and query answering. Th. Comput. Sci. 336 (1)
Google Scholar
Lenzerini, M.: Data Integration: a theoretical perspective. In: PODS (2002)
Google Scholar
Rodríguez-Gianolli, P., Kementsietsidis, A., Garzetti, M., Kiringa, I., Jiang, L., Masud, M., Miller, R.J., Mylopoulos, J.: Data sharing in the hyperion peer database system. In: VLDB (2005)
Google Scholar
Duschka, O.M., Genesereth, M.R.: Answering recursive queries using views. In: PODS (1997)
Google Scholar
Levy, A.Y., Rajaraman, A., Ordille, J.O.: Query-answering algorithms for information agents. In: 13th International Conference on Artificila Intelligence (1996)
Google Scholar
Pottinger, R., Levy, A.: A scalable algorithm for answering queries using views. In: VLDB (2000)
Google Scholar
Kantere, V., Orfanoudakis, G., Kementsietsidis, A., Sellis, T.: Query relaxation across heterogeneous data sources. In: ACM CIKM 2015, pp. 473–482
Google Scholar
Batista, G., Monard, M.C.: A study of k-nearest neighbour as an imputation method. In: HIS (2002)
Google Scholar
Poosala, V., Ganti, V.: Fast approximate query answering using precomputed statistics. In: ICDE, p. 252 (1999)
Google Scholar
Agrawal, S., Chaudhuri, S., Das, G.: DBXplorer: a system for keyword-based search over relational databases. In: ICDE (2002)
Google Scholar
Cohen, W.: Integration of heterogeneous databases without common domains using queries based on textual similarity. In: SIGMOD (1998)
Google Scholar
Motro, A.: VAGUE: A user interface to relational databases that permis vague queries. TOIS 6(3), 187–214 (1988)
Article Google Scholar
Fuhr, N.: A probabilistic framework for vague queries and imprecise information in databases. In: VLDB (1990)
Google Scholar
Kiebling, W., Kostner, G.: Preference SQL - design, implementation, experiences. In: VLDB (2002)
Google Scholar
Agrawal, S., Chaudhuri, S., Das, G., Gionis, A.: Automated ranking of database query results. In: CIDR (2003)
Google Scholar
Ghosh, A., Parikh, J., Sengar, V.S., Haritsa, J.R.: Plan selection based on query clustering. In: Intelligent Information Integration (1999)
Google Scholar
Chu, W.W., Zhang, G.: Associative query answering via query feature similarity. In: IIS (1997)
Google Scholar
Potti, N., Patel, J.M.: Daq: a new paradigm for approximate query processing. In: VLDB, vol. 8
Google Scholar
Fan, W., Geerts, F., Libkin, L.: On scale independence for querying big data. In: ACM PODS, pp. 51–62 (2014)
Google Scholar
Cao, Y., Fan, W., Wo, T., Yu, W.: Bounded conjunctive queries. PVLDB 7(12), 1231–1242 (2014)
Google Scholar
Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K.C., Suel, T.: Optimal histograms with quality guarantees. In: VLDB, pp. 275–286 (1998)
Google Scholar
Garofalakis, M.N., Gibbons, P.B.: Wavelet synopses with error guarantees. In: ACM SIGMOD, pp. 476–487 (2002)
Google Scholar
Agarwal, S., Milner, H., Kleiner, A., Talwalkar, A., Jordan, M.I., Madden, S., Mozafari, B., Stoica, I.: Knowing when you’re wrong: building fast and reliable approximate query processing systems. In: ACM SIGMOD, pp. 481–492 (2014)
Google Scholar
Agarwal, S., Mozafari, B., Panda, A., Milner, H., Madden, S., Stoica, I.: Blinkdb: queries with bounded errors and bounded response times on very large data. In: EuroSys, pp. 29–42 (2013)
Google Scholar
Chaudhuri, S., Kolaitis, P.G.: Can datalog be approximated? J. Comput. Syst. Sci. 55(2), 355–369 (1997)
Article MathSciNet MATH Google Scholar
Barceló, P., Libkin, L., Romero, M.: Efficient approximations of conjunctive queries. SIAM J. Comp. 43(3), 1085–1130 (2014)
Article MathSciNet MATH Google Scholar
Fink, R., Olteanu, D.: On the optimal approximation of queries using tractable propositional languages. In: ICDT (2011)
Google Scholar
Fan, W., Li, J., Ma, S., Tang, N., Wu, Y., Wu, Y.: Graph pattern matching: From intractable to polynomial time. PVLDB 3(1), 264–275 (2010)
Google Scholar
Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: a versatile graph matching algorithm and its application to schema mathcing. In: ICDE (2002)
Google Scholar
Doan, A., Madhavan, J., Dhamankar, R., Domingos, P., Halevy, A.: Learning to match ontologies on the semantic web. VLDB J. 12(4), 303–319 (2003)
Article Google Scholar
Kantere, V., Tsoumakos, D., Sellis, T., Roussopoulos, N.: GrouPeer: dynamic clustering of P2P databases. In: Information Systems (2008). doi:10.1016/j.is.2008.04.002
Google Scholar
Lín, V., Vassalos, V., Malakasiotis, P.: Minicount: Efficient rewriting of count-queries using views. In: ICDE, p. 1 (2006)
Google Scholar
Steinbrunn, M., Moerkotte, G., Kemper, A.: Heuristic and randomized optimization for the join ordering problem. VLDB J. 6(3), 191–208 (1997)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Geneva, Geneva, Switzerland
Verena Kantere

Authors

Verena Kantere
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Verena Kantere .

Editor information

Editors and Affiliations

Clausthal University of Technology, Clausthal-Zellerfeld, Germany
Sven Hartmann
Victoria University of Wellington, Wellington, New Zealand
Hui Ma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kantere, V. (2016). Query Similarity for Approximate Query Answering. In: Hartmann, S., Ma, H. (eds) Database and Expert Systems Applications. DEXA 2016. Lecture Notes in Computer Science(), vol 9828. Springer, Cham. https://doi.org/10.1007/978-3-319-44406-2_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-44406-2_29
Published: 06 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44405-5
Online ISBN: 978-3-319-44406-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics