Skip to main content
Log in

Processing top-N relational queries by learning

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

A top-N selection query against a relation is to find the N tuples that satisfy the query condition the best but not necessarily completely. In this paper, we propose a new method for evaluating top-N queries against a relation. This method employs a learning-based strategy. Initially, this method finds and saves the optimal search spaces for a small number of random top-N queries. The learned knowledge is then used to evaluate new queries. Extensive experiments are carried out to measure the performance of this strategy and the results indicate that it is highly competitive with existing techniques for both low-dimensional and high-dimensional data. Furthermore, the knowledge base can be updated based on new user queries to reflect new query patterns so that frequently submitted queries can be processed most efficiently. The maintenance and stability of the knowledge base are also addressed in the paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. In all the figures in Section 5.5, we arrange the legends and the corresponding curves in the same order to reduce confusion; the order of legends is from left to right (if any), and then from top to bottom.

References

  • Balke, W., Nejdl, W., Siberski, W., & Thaden, U. (2005). Progressive distributed top-k retrieval in peer-to-peer networks. In Proceedings of the 21st international conference on data engineering (ICDE’05) (pp. 174–185). Tokyo, Japan.

  • Bast, H., Majumdar, D., Schenkel, R., Theobald, M., & Weikum, G. (2006). IO-top-k: Index-access optimized top-k query processing. In Proceedings of 29th international conference on very large data bases (VLDB’06) (pp. 475–486). Seoul, Korea.

  • Bowerman, B. L., & O’Connell, R. T. (1993). Forecasting and time series: An applied approach (3rd ed.). Pacific Grove: Brooks/Cole.

    MATH  Google Scholar 

  • Bruno, N., Chaudhuri, S., & Gravano, L. (2001). STHoles: A multidimensional workload-aware histogram. In Proceedings ACM international conference on management of data (SIGMOD’01) (pp. 211–222). Santa Barbara, California, USA.

  • Bruno, N., Chaudhuri, S., & Gravano, L. (2002). Top-k selection queries over relational databases: Mapping strategies and performance evaluation. ACM Transactions on Database Systems, 27(2), 153–187. doi:10.1145/568518.568519.

    Article  Google Scholar 

  • Carey, M., & Kossmann, D. (1997). On saying “enough already!” in SQL. In Proceedings ACM international conference on management of data (SIGMOD’97) (pp. 219–230). Tucson, Arizona, USA.

  • Carey, M., & Kossmann, D. (1998). Reducing the braking distance of an SQL query engine. In Proceedings of 24th international conference on very large data bases (VLDB’98) (pp. 158–169). New York City, New York, USA.

  • Chang, Y.-C., Bergman, L. D., Castelli, V., Li, C.-S., Lo, M.-L., & Smith, J. R. (2000). The onion technique: Indexing for linear optimization queries. In Proceedings ACM international conference on management of data (SIGMOD’00) (pp. 391–402). Dallas, Texas, USA.

  • Chaudhuri, S., & Gravano, L. (1999). Evaluating top-k selection queries. In Proceedings of 25th international conference on very large data bases (VLDB’99) (pp. 397–410). Edinburgh, Scotland, UK.

  • Chaudhuri, S., Gravano, L., & Marian, A. (2004). Optimizing top-k selection queries over multimedia repositories. IEEE Transactions on Knowledge and Data Engineering, 16(8), 992–1009. doi:10.1109/TKDE.2004.30.

    Article  Google Scholar 

  • Chen, C., & Ling, Y. (2002). A sampling-based estimator for top-k selection query. In Proceedings of the 18th international conference on data engineering (ICDE’02) (pp. 617–627). San Jose, California.

  • Chen, Y., & Meng, W. (2003). Top-N query: Query language, distance function, and processing strategies. In International conference on web-age information management (pp. 458–470). Chengdu, China: Springer.

    Google Scholar 

  • Das, G., Gunopulos, D., & Koudas, N. (2006). Answering top-k queries using views. In Proceedings of 29th international conference on very large data bases (VLDB’06) (pp. 451–462). Seoul, Korea.

  • Donjerkovic, D., & Ramakrishnan, R. (1999). Probabilistic optimization of top N queries. In Proceedings of 25th international conference on very large data bases (VLDB’99) (pp. 411–422). Edinburgh, Scotland, UK.

  • Fagin, R., Lotem, A., & Naor, M. (2001). Optimal aggregation algorithms for middleware. In Proceedings of the twentieth ACM symposium on principles of database systems (PODS’01) (pp. 102–113). Santa Barbara, California, USA.

  • Fleming, W. (1977). Functions of several variables, Addison-Wesley, 1965 (2nd ed.). New York: Springer.

    Google Scholar 

  • Habich, D., Lehner, W., & Hinneburg, A. (2005). Optimizing multiple top-K queries over joins. In Proceedings of the 17th international conference on scientific and statistical database management (pp. 195–204). Santa Barbara, CA, USA.

  • Hristidis, V., Koudas, N., & Papakonstantinou, Y. (2001). PREFER: A system for the efficient execution of multi-parametric ranked queries. In Proceedings of the 2001 ACM international conference on management of data (SIGMOD’01) (pp. 259–270). Santa Barbara, California, USA.

  • Hristidis, V., & Papakonstantinou, Y. (2004). Algorithms and applications for answering ranked queries using ranked views. The VLDB Journal, 13(1), 49–70. doi:10.1007/s00778-003-0099-8.

    Article  Google Scholar 

  • Hwang, S.-W., & Chang, K. C.-C. (2007a). Probe minimization by schedule optimization: Supporting top-k queries with expensive predicates. IEEE Transactions on Knowledge and Data Engineering, 19(5), 646–662. doi:10.1109/TKDE.2007.1007.

    Article  MathSciNet  Google Scholar 

  • Hwang, S.-W., & Chang, K. C.-C. (2007b). Optimizing top-k queries for middleware access: A unified cost-based approach. ACM Transactions on Database Systems, 32(1), 1–41. doi:10.1145/1206049.1206054.

    Article  Google Scholar 

  • Ilyas, I., Aref, W., & Elmagarmid, A. (2002). Joining ranked inputs in practice. In Proceedings of 28th international conference on very large data bases (VLDB’02) (pp. 950–961). Hong Kong, China.

  • Ilyas, I., Aref, W., & Elmagarmid, A. (2004a). Supporting top-k join queries in relational databases. The VLDB Journal, 13(3), 207–221. doi:10.1007/s00778-004-0128-2.

    Article  Google Scholar 

  • Ilyas, I., Shah, R., Aref, W., Vitter, J., & Elmagarmid, A. (2004b). Rank-aware query optimization. In Proceedings ACM international conference on management of data (SIGMOD’04) (pp. 203–214). Paris, France.

  • Lee, J., Kim, D., & Chung, C. (1999). Multi-dimensional selectivity estimation using compressed histogram information. In Proceedings ACM international conference on management of data (SIGMOD’99) (pp. 205–214). Philadelphia, Pennsylvania, USA.

  • Li, C., Chang, K., Ilyas, I., & Song, S. (2005). RankSQL, query algebra and optimization for relational top-k queries. In Proceedings ACM international conference on management of data (SIGMOD’05) (pp. 131–142). Baltimore, Maryland, USA.

  • Marian, A., Bruno, N., & Gravano, L. (2004). Evaluating top-k queries over web-accessible databases. ACM Transactions on Database Systems, 29(2), 319–362. doi:10.1145/1005566.1005569.

    Article  Google Scholar 

  • Michel, S., Triantafillou, P., & Weikum, G. (2005). KLEE: A framework for distributed top-k query algorithms. In Proceedings of the 31st international conference on very large data bases (VLDB’05) (pp. 637–648). Trondheim, Norway.

  • Motro, A. (1988). VAGUE: A user interface to relational databases that permits vague queries. ACM Transactions on Office Information Systems, 6(3), 187–214. doi:10.1145/45945.48027.

    Article  Google Scholar 

  • Silberschatz, A., Korth, H. F., & Sudarshan, S. (2002). Database system concepts (4th ed.). New York: McGraw-Hill.

    Google Scholar 

  • Soliman, M. A., Chang, K. C.-C., & Ilyas, I. F. (2007). Top-k query processing in uncertain databases. In Proceedings of the 2007 IEEE 23rd international conference on data engineering (ICDE’07) (pp. 896–905). Istanbul, Turkey.

  • Theobald, M., Weikum, G., & Schenkel, R. (2004). Top-k query evaluation with probabilistic guarantees. In Proceedings of the thirtieth international conference on very large data bases (VLDB’04) (pp. 648–659). Toronto, Canada.

  • Vlachou, A., Doulkeridis, C., Nørvåg, K., & Vazirgiannis, M. (2008). On efficient top-k query processing in highly distributed environments. In Proceedings ACM international conference on management of data (SIGMOD’08) (pp. 753–764). Vancouver, BC, Canada.

  • Xin, D., Han, J., Cheng, H., & Li, X. (2006). Answering top-k queries with multi-dimensional selections: The ranking cube approach. In Proceedings of 29th international conference on very large data bases (VLDB’06) (pp. 463–474). Seoul, Korea.

  • Xin, D., Han, J., & Chang, K. C.-C. (2007). Progressive and selective merge. computing top-k with ad-hoc ranking functions. In Proceedings ACM international conference on management of data (SIGMOD’07) (pp. 103–114). Beijing, China.

  • Yi, K., Yu, H., Yang, J., Xia, G., & Chen, Y. (2003). Efficient maintenance of materialized top-k views. In Proceedings of the 19th international conference on data engineering (ICDE’03) (pp. 189–200). Bangalore, India.

  • Yiu, M. L., & Mamoulis, N. (2007). Efficient processing of top-k dominating queries on multi-dimensional data. In Proceedings of 33rd international conference on very large data bases (VLDB’07) (pp. 483–494). Vienna, Austria.

  • Yu, C., Philip, G., & Meng, W. (2003). Distributed top-N query processing with possibly uncooperative local systems. In Proceedings of 29th international conference on very large data bases (VLDB’03) (pp. 117–128). Berlin, Germany.

  • Yu, C., Sharma, P., Meng, W., & Qin, Y. (2001). Database selection for processing k nearest neighbors queries in distributed environments. In ACM/IEEE joint conference on digital libraries (JCDL’01) (pp. 215–222). Roanoke, Virginia, USA.

  • Zhao, K., Tao, Y., & Zhou, S. (2007). Efficient top-k processing in large-scaled distributed environments. Data & Knowledge Engineering, 63(2), 315–335. doi:10.1016/j.datak.2007.03.012.

    Article  Google Scholar 

  • Zhu, L., & Meng, W. (2004). Learning-based top-N selection query evaluation over relational databases. In Advances in web-age information management: 5th international conference (WAIM’04) (pp. 197–207). Dalian, China.

  • Zhu, L., Meng, W., Yang, W., & Liu, C. (2008). Region clustering based evaluation of multiple top-N selection queries. Data & Knowledge Engineering, 64(2), 439–461. doi:10.1016/j.datak.2007.09.004.

    Article  Google Scholar 

  • Zhu, M., Papadias, D., Zhang, J., & Lee, D. (2005). Top-k spatial joins. IEEE Transactions on Knowledge and Data Engineering, 17(4), 567–579. doi:10.1109/TKDE.2005.65.

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported in part by: the NSFC major research program “Basic Theory and Core Techniques of Non-Canonical Knowledge” (60496322, 60496327) and NSFHEE (2004305). The authors would also like to express their gratitude to Nicolas Bruno (an author of Bruno et al. 2002) and Chung-Min Chen (an author of Chen and Ling 2002) for providing us some of the test datasets used in this paper. Furthermore, they provided us some experimental results of their approaches and these results made it possible for us to compare with their results directly in this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liang Zhu.

Additional information

Part of the contents of this paper was published in the Proceedings of WAIM’04.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, L., Meng, W., Liu, C. et al. Processing top-N relational queries by learning. J Intell Inf Syst 34, 21–55 (2010). https://doi.org/10.1007/s10844-009-0078-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-009-0078-7

Keywords

Navigation