Abstract
Query rewriting in heterogeneous environments assumes mappings that are complete. In reality and especially in the Big Data era it is rarely the case that such complete sets of mappings exist between sources, and the presence of partial mappings is the norm rather than the exception. So, practically, existing rewriting algorithms fail in the majority of cases. The solution is to approximate original queries with others that can be answered by existing mappings. Approximate queries bear some similarity to original ones in terms of structure and semantics. In this paper we investigate the notion of such query similarity and we introduce the use of query similarity functions to this end. We also present a methodology for the construction of such functions. We employ exemplary similarity functions created with the proposed methodology into recent algorithms for approximate query answering and show experimental results for the influence of the similarity function to the efficiency of the algorithms.
This research is funded from the EU FP7 project ASAP, under Grant Agreement \(n^o\) 619706.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
The problem of storing approximate answers to the database on which \(Q_{orig}\) is posed, is related to database versioning and is out of the scope of this paper.
- 2.
Intuitively, \(Q_{apprx}\) deviates from \(Q_{orig}\) only by additional constraints and not by additional ’select’ attributes.
- 3.
Query elements that correspond to ’select’ attributes have a 1-1- correspondence in both \(Q_{orig}^{SQL}\) and \(Q_{orig}^{conj}\). In the conjunctive form, these are called distinguished variables.
- 4.
Actually, this is guaranteed by the classical query rewriting methodology, which creates contained rewritten versions.
References
Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data exchange: semantics and query answering. Th. Comput. Sci. 336 (1)
Lenzerini, M.: Data Integration: a theoretical perspective. In: PODS (2002)
Rodríguez-Gianolli, P., Kementsietsidis, A., Garzetti, M., Kiringa, I., Jiang, L., Masud, M., Miller, R.J., Mylopoulos, J.: Data sharing in the hyperion peer database system. In: VLDB (2005)
Duschka, O.M., Genesereth, M.R.: Answering recursive queries using views. In: PODS (1997)
Levy, A.Y., Rajaraman, A., Ordille, J.O.: Query-answering algorithms for information agents. In: 13th International Conference on Artificila Intelligence (1996)
Pottinger, R., Levy, A.: A scalable algorithm for answering queries using views. In: VLDB (2000)
Kantere, V., Orfanoudakis, G., Kementsietsidis, A., Sellis, T.: Query relaxation across heterogeneous data sources. In: ACM CIKM 2015, pp. 473–482
Batista, G., Monard, M.C.: A study of k-nearest neighbour as an imputation method. In: HIS (2002)
Poosala, V., Ganti, V.: Fast approximate query answering using precomputed statistics. In: ICDE, p. 252 (1999)
Agrawal, S., Chaudhuri, S., Das, G.: DBXplorer: a system for keyword-based search over relational databases. In: ICDE (2002)
Cohen, W.: Integration of heterogeneous databases without common domains using queries based on textual similarity. In: SIGMOD (1998)
Motro, A.: VAGUE: A user interface to relational databases that permis vague queries. TOIS 6(3), 187–214 (1988)
Fuhr, N.: A probabilistic framework for vague queries and imprecise information in databases. In: VLDB (1990)
Kiebling, W., Kostner, G.: Preference SQL - design, implementation, experiences. In: VLDB (2002)
Agrawal, S., Chaudhuri, S., Das, G., Gionis, A.: Automated ranking of database query results. In: CIDR (2003)
Ghosh, A., Parikh, J., Sengar, V.S., Haritsa, J.R.: Plan selection based on query clustering. In: Intelligent Information Integration (1999)
Chu, W.W., Zhang, G.: Associative query answering via query feature similarity. In: IIS (1997)
Potti, N., Patel, J.M.: Daq: a new paradigm for approximate query processing. In: VLDB, vol. 8
Fan, W., Geerts, F., Libkin, L.: On scale independence for querying big data. In: ACM PODS, pp. 51–62 (2014)
Cao, Y., Fan, W., Wo, T., Yu, W.: Bounded conjunctive queries. PVLDB 7(12), 1231–1242 (2014)
Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K.C., Suel, T.: Optimal histograms with quality guarantees. In: VLDB, pp. 275–286 (1998)
Garofalakis, M.N., Gibbons, P.B.: Wavelet synopses with error guarantees. In: ACM SIGMOD, pp. 476–487 (2002)
Agarwal, S., Milner, H., Kleiner, A., Talwalkar, A., Jordan, M.I., Madden, S., Mozafari, B., Stoica, I.: Knowing when you’re wrong: building fast and reliable approximate query processing systems. In: ACM SIGMOD, pp. 481–492 (2014)
Agarwal, S., Mozafari, B., Panda, A., Milner, H., Madden, S., Stoica, I.: Blinkdb: queries with bounded errors and bounded response times on very large data. In: EuroSys, pp. 29–42 (2013)
Chaudhuri, S., Kolaitis, P.G.: Can datalog be approximated? J. Comput. Syst. Sci. 55(2), 355–369 (1997)
Barceló, P., Libkin, L., Romero, M.: Efficient approximations of conjunctive queries. SIAM J. Comp. 43(3), 1085–1130 (2014)
Fink, R., Olteanu, D.: On the optimal approximation of queries using tractable propositional languages. In: ICDT (2011)
Fan, W., Li, J., Ma, S., Tang, N., Wu, Y., Wu, Y.: Graph pattern matching: From intractable to polynomial time. PVLDB 3(1), 264–275 (2010)
Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: a versatile graph matching algorithm and its application to schema mathcing. In: ICDE (2002)
Doan, A., Madhavan, J., Dhamankar, R., Domingos, P., Halevy, A.: Learning to match ontologies on the semantic web. VLDB J. 12(4), 303–319 (2003)
Kantere, V., Tsoumakos, D., Sellis, T., Roussopoulos, N.: GrouPeer: dynamic clustering of P2P databases. In: Information Systems (2008). doi:10.1016/j.is.2008.04.002
Lín, V., Vassalos, V., Malakasiotis, P.: Minicount: Efficient rewriting of count-queries using views. In: ICDE, p. 1 (2006)
Steinbrunn, M., Moerkotte, G., Kemper, A.: Heuristic and randomized optimization for the join ordering problem. VLDB J. 6(3), 191–208 (1997)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Kantere, V. (2016). Query Similarity for Approximate Query Answering. In: Hartmann, S., Ma, H. (eds) Database and Expert Systems Applications. DEXA 2016. Lecture Notes in Computer Science(), vol 9828. Springer, Cham. https://doi.org/10.1007/978-3-319-44406-2_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-44406-2_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44405-5
Online ISBN: 978-3-319-44406-2
eBook Packages: Computer ScienceComputer Science (R0)