Processing top-N relational queries by learning

Zhu, Liang; Meng, Weiyi; Liu, Chunnian; Yang, Wenzhu; Liu, Dazhong

doi:10.1007/s10844-009-0078-7

Processing top-N relational queries by learning

Published: 14 February 2009

Volume 34, pages 21–55, (2010)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Liang Zhu^1,2,
Weiyi Meng³,
Chunnian Liu¹,
Wenzhu Yang² &
…
Dazhong Liu²

142 Accesses
10 Citations
Explore all metrics

Abstract

A top-N selection query against a relation is to find the N tuples that satisfy the query condition the best but not necessarily completely. In this paper, we propose a new method for evaluating top-N queries against a relation. This method employs a learning-based strategy. Initially, this method finds and saves the optimal search spaces for a small number of random top-N queries. The learned knowledge is then used to evaluate new queries. Extensive experiments are carried out to measure the performance of this strategy and the results indicate that it is highly competitive with existing techniques for both low-dimensional and high-dimensional data. Furthermore, the knowledge base can be updated based on new user queries to reflect new query patterns so that frequently submitted queries can be processed most efficiently. The maintenance and stability of the knowledge base are also addressed in the paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast Top-Q and Top-K Query Answering

Logical scalability and efficiency of relational learning algorithms

Article 03 November 2018

Learning Models over Relational Data: A Brief Tutorial

Notes

In all the figures in Section 5.5, we arrange the legends and the corresponding curves in the same order to reduce confusion; the order of legends is from left to right (if any), and then from top to bottom.

References

Balke, W., Nejdl, W., Siberski, W., & Thaden, U. (2005). Progressive distributed top-k retrieval in peer-to-peer networks. In Proceedings of the 21st international conference on data engineering (ICDE’05) (pp. 174–185). Tokyo, Japan.
Bast, H., Majumdar, D., Schenkel, R., Theobald, M., & Weikum, G. (2006). IO-top-k: Index-access optimized top-k query processing. In Proceedings of 29th international conference on very large data bases (VLDB’06) (pp. 475–486). Seoul, Korea.
Bowerman, B. L., & O’Connell, R. T. (1993). Forecasting and time series: An applied approach (3rd ed.). Pacific Grove: Brooks/Cole.
MATH Google Scholar
Bruno, N., Chaudhuri, S., & Gravano, L. (2001). STHoles: A multidimensional workload-aware histogram. In Proceedings ACM international conference on management of data (SIGMOD’01) (pp. 211–222). Santa Barbara, California, USA.
Bruno, N., Chaudhuri, S., & Gravano, L. (2002). Top-k selection queries over relational databases: Mapping strategies and performance evaluation. ACM Transactions on Database Systems, 27(2), 153–187. doi:10.1145/568518.568519.
Article Google Scholar
Carey, M., & Kossmann, D. (1997). On saying “enough already!” in SQL. In Proceedings ACM international conference on management of data (SIGMOD’97) (pp. 219–230). Tucson, Arizona, USA.
Carey, M., & Kossmann, D. (1998). Reducing the braking distance of an SQL query engine. In Proceedings of 24th international conference on very large data bases (VLDB’98) (pp. 158–169). New York City, New York, USA.
Chang, Y.-C., Bergman, L. D., Castelli, V., Li, C.-S., Lo, M.-L., & Smith, J. R. (2000). The onion technique: Indexing for linear optimization queries. In Proceedings ACM international conference on management of data (SIGMOD’00) (pp. 391–402). Dallas, Texas, USA.
Chaudhuri, S., & Gravano, L. (1999). Evaluating top-k selection queries. In Proceedings of 25th international conference on very large data bases (VLDB’99) (pp. 397–410). Edinburgh, Scotland, UK.
Chaudhuri, S., Gravano, L., & Marian, A. (2004). Optimizing top-k selection queries over multimedia repositories. IEEE Transactions on Knowledge and Data Engineering, 16(8), 992–1009. doi:10.1109/TKDE.2004.30.
Article Google Scholar
Chen, C., & Ling, Y. (2002). A sampling-based estimator for top-k selection query. In Proceedings of the 18th international conference on data engineering (ICDE’02) (pp. 617–627). San Jose, California.
Chen, Y., & Meng, W. (2003). Top-N query: Query language, distance function, and processing strategies. In International conference on web-age information management (pp. 458–470). Chengdu, China: Springer.
Google Scholar
Das, G., Gunopulos, D., & Koudas, N. (2006). Answering top-k queries using views. In Proceedings of 29th international conference on very large data bases (VLDB’06) (pp. 451–462). Seoul, Korea.
Donjerkovic, D., & Ramakrishnan, R. (1999). Probabilistic optimization of top N queries. In Proceedings of 25th international conference on very large data bases (VLDB’99) (pp. 411–422). Edinburgh, Scotland, UK.
Fagin, R., Lotem, A., & Naor, M. (2001). Optimal aggregation algorithms for middleware. In Proceedings of the twentieth ACM symposium on principles of database systems (PODS’01) (pp. 102–113). Santa Barbara, California, USA.
Fleming, W. (1977). Functions of several variables, Addison-Wesley, 1965 (2nd ed.). New York: Springer.
Google Scholar
Habich, D., Lehner, W., & Hinneburg, A. (2005). Optimizing multiple top-K queries over joins. In Proceedings of the 17th international conference on scientific and statistical database management (pp. 195–204). Santa Barbara, CA, USA.
Hristidis, V., Koudas, N., & Papakonstantinou, Y. (2001). PREFER: A system for the efficient execution of multi-parametric ranked queries. In Proceedings of the 2001 ACM international conference on management of data (SIGMOD’01) (pp. 259–270). Santa Barbara, California, USA.
Hristidis, V., & Papakonstantinou, Y. (2004). Algorithms and applications for answering ranked queries using ranked views. The VLDB Journal, 13(1), 49–70. doi:10.1007/s00778-003-0099-8.
Article Google Scholar
Hwang, S.-W., & Chang, K. C.-C. (2007a). Probe minimization by schedule optimization: Supporting top-k queries with expensive predicates. IEEE Transactions on Knowledge and Data Engineering, 19(5), 646–662. doi:10.1109/TKDE.2007.1007.
Article MathSciNet Google Scholar
Hwang, S.-W., & Chang, K. C.-C. (2007b). Optimizing top-k queries for middleware access: A unified cost-based approach. ACM Transactions on Database Systems, 32(1), 1–41. doi:10.1145/1206049.1206054.
Article Google Scholar
Ilyas, I., Aref, W., & Elmagarmid, A. (2002). Joining ranked inputs in practice. In Proceedings of 28th international conference on very large data bases (VLDB’02) (pp. 950–961). Hong Kong, China.
Ilyas, I., Aref, W., & Elmagarmid, A. (2004a). Supporting top-k join queries in relational databases. The VLDB Journal, 13(3), 207–221. doi:10.1007/s00778-004-0128-2.
Article Google Scholar
Ilyas, I., Shah, R., Aref, W., Vitter, J., & Elmagarmid, A. (2004b). Rank-aware query optimization. In Proceedings ACM international conference on management of data (SIGMOD’04) (pp. 203–214). Paris, France.
Lee, J., Kim, D., & Chung, C. (1999). Multi-dimensional selectivity estimation using compressed histogram information. In Proceedings ACM international conference on management of data (SIGMOD’99) (pp. 205–214). Philadelphia, Pennsylvania, USA.
Li, C., Chang, K., Ilyas, I., & Song, S. (2005). RankSQL, query algebra and optimization for relational top-k queries. In Proceedings ACM international conference on management of data (SIGMOD’05) (pp. 131–142). Baltimore, Maryland, USA.
Marian, A., Bruno, N., & Gravano, L. (2004). Evaluating top-k queries over web-accessible databases. ACM Transactions on Database Systems, 29(2), 319–362. doi:10.1145/1005566.1005569.
Article Google Scholar
Michel, S., Triantafillou, P., & Weikum, G. (2005). KLEE: A framework for distributed top-k query algorithms. In Proceedings of the 31st international conference on very large data bases (VLDB’05) (pp. 637–648). Trondheim, Norway.
Motro, A. (1988). VAGUE: A user interface to relational databases that permits vague queries. ACM Transactions on Office Information Systems, 6(3), 187–214. doi:10.1145/45945.48027.
Article Google Scholar
Silberschatz, A., Korth, H. F., & Sudarshan, S. (2002). Database system concepts (4th ed.). New York: McGraw-Hill.
Google Scholar
Soliman, M. A., Chang, K. C.-C., & Ilyas, I. F. (2007). Top-k query processing in uncertain databases. In Proceedings of the 2007 IEEE 23rd international conference on data engineering (ICDE’07) (pp. 896–905). Istanbul, Turkey.
Theobald, M., Weikum, G., & Schenkel, R. (2004). Top-k query evaluation with probabilistic guarantees. In Proceedings of the thirtieth international conference on very large data bases (VLDB’04) (pp. 648–659). Toronto, Canada.
Vlachou, A., Doulkeridis, C., Nørvåg, K., & Vazirgiannis, M. (2008). On efficient top-k query processing in highly distributed environments. In Proceedings ACM international conference on management of data (SIGMOD’08) (pp. 753–764). Vancouver, BC, Canada.
Xin, D., Han, J., Cheng, H., & Li, X. (2006). Answering top-k queries with multi-dimensional selections: The ranking cube approach. In Proceedings of 29th international conference on very large data bases (VLDB’06) (pp. 463–474). Seoul, Korea.
Xin, D., Han, J., & Chang, K. C.-C. (2007). Progressive and selective merge. computing top-k with ad-hoc ranking functions. In Proceedings ACM international conference on management of data (SIGMOD’07) (pp. 103–114). Beijing, China.
Yi, K., Yu, H., Yang, J., Xia, G., & Chen, Y. (2003). Efficient maintenance of materialized top-k views. In Proceedings of the 19th international conference on data engineering (ICDE’03) (pp. 189–200). Bangalore, India.
Yiu, M. L., & Mamoulis, N. (2007). Efficient processing of top-k dominating queries on multi-dimensional data. In Proceedings of 33rd international conference on very large data bases (VLDB’07) (pp. 483–494). Vienna, Austria.
Yu, C., Philip, G., & Meng, W. (2003). Distributed top-N query processing with possibly uncooperative local systems. In Proceedings of 29th international conference on very large data bases (VLDB’03) (pp. 117–128). Berlin, Germany.
Yu, C., Sharma, P., Meng, W., & Qin, Y. (2001). Database selection for processing k nearest neighbors queries in distributed environments. In ACM/IEEE joint conference on digital libraries (JCDL’01) (pp. 215–222). Roanoke, Virginia, USA.
Zhao, K., Tao, Y., & Zhou, S. (2007). Efficient top-k processing in large-scaled distributed environments. Data & Knowledge Engineering, 63(2), 315–335. doi:10.1016/j.datak.2007.03.012.
Article Google Scholar
Zhu, L., & Meng, W. (2004). Learning-based top-N selection query evaluation over relational databases. In Advances in web-age information management: 5th international conference (WAIM’04) (pp. 197–207). Dalian, China.
Zhu, L., Meng, W., Yang, W., & Liu, C. (2008). Region clustering based evaluation of multiple top-N selection queries. Data & Knowledge Engineering, 64(2), 439–461. doi:10.1016/j.datak.2007.09.004.
Article Google Scholar
Zhu, M., Papadias, D., Zhang, J., & Lee, D. (2005). Top-k spatial joins. IEEE Transactions on Knowledge and Data Engineering, 17(4), 567–579. doi:10.1109/TKDE.2005.65.
Article Google Scholar

Download references

Acknowledgements

This work is supported in part by: the NSFC major research program “Basic Theory and Core Techniques of Non-Canonical Knowledge” (60496322, 60496327) and NSFHEE (2004305). The authors would also like to express their gratitude to Nicolas Bruno (an author of Bruno et al. 2002) and Chung-Min Chen (an author of Chen and Ling 2002) for providing us some of the test datasets used in this paper. Furthermore, they provided us some experimental results of their approaches and these results made it possible for us to compare with their results directly in this paper.

Author information

Authors and Affiliations

College of Computer Science and Technology, Beijing University of Technology, Beijing, 100124, China
Liang Zhu & Chunnian Liu
Key Laboratory of Machine Learning and Computational Intelligence, School of Mathematics and Computer Science, Hebei University, Baoding, Hebei, 071002, China
Liang Zhu, Wenzhu Yang & Dazhong Liu
Department of Computer Science, State University of New York at Binghamton, Binghamton, NY, 13902, USA
Weiyi Meng

Authors

Liang Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Weiyi Meng
View author publications
You can also search for this author in PubMed Google Scholar
Chunnian Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wenzhu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Dazhong Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liang Zhu.

Additional information

Part of the contents of this paper was published in the Proceedings of WAIM’04.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, L., Meng, W., Liu, C. et al. Processing top-N relational queries by learning. J Intell Inf Syst 34, 21–55 (2010). https://doi.org/10.1007/s10844-009-0078-7

Download citation

Received: 25 July 2008
Revised: 06 January 2009
Accepted: 07 January 2009
Published: 14 February 2009
Issue Date: February 2010
DOI: https://doi.org/10.1007/s10844-009-0078-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Processing top-N relational queries by learning

Abstract

Access this article

Similar content being viewed by others

Fast Top-Q and Top-K Query Answering

Logical scalability and efficiency of relational learning algorithms

Learning Models over Relational Data: A Brief Tutorial

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Processing top-N relational queries by learning

Abstract

Access this article

Similar content being viewed by others

Fast Top-Q and Top-K Query Answering

Logical scalability and efficiency of relational learning algorithms

Learning Models over Relational Data: A Brief Tutorial

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation