Handling Query Skew in Large Indexes: A View Based Approach

  • Weihuang HuangEmail author
  • Jeffrey Xu Yu
  • Zechao Shang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9093)


Indexing is one of the most important techniques to facilitate query processing over a multi-dimensional dataset. A commonly used strategy for such indexing is to keep the tree-structured index balanced. This strategy implies that all queries are uniformly issued, which is partially because the query distribution is not possibly known and will change over time in practice. A key issue we study in this work is whether it is the best to fully rely on a balanced tree-structured index in particular when datasets become larger and larger. This means that, when a dataset becomes very large, it becomes unreasonable to assume that all data in any subspace are equally important and are uniformly accessed by all queries at the index level. Given the existence of query skew, in this paper, we study how to handle such query skew at the index level without sacrifice of supporting any possible queries in a well-balanced tree index and without a high overhead. To tackle the issue, we propose index-view at the index level, where an index-view is a short-cut in a balanced tree-structured index to access objects in the subspace that are more frequently accessed, and propose a new index-view-centric framework for query processing using index-views in a bottom-up manner. We study index-views selection problem, and we confirm the effectiveness of our approach using large real and synthetic datasets.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Achakeev, D., Seeger, B., Widmayer, P.: Sort-based query-adaptive loading of r-trees. In: Proc. of CIKM 2012 (2012)Google Scholar
  2. 2.
    Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., Wu, A.Y.: An optimal algorithm for approximate nearest neighbor searching. In: Proc. of SODA 1994 (1994)Google Scholar
  3. 3.
    Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9) (1975)Google Scholar
  4. 4.
    Cudré-Mauroux, P., Wu, E., Madden, S.: Trajstore: an adaptive storage system for very large trajectory data sets. In: Proc. of ICDE 2010 (2010)Google Scholar
  5. 5.
    Felipe, I.D., Hristidis, V., Rishe, N.: Keyword search on spatial databases. In: Proc. of ICDE 2008 (2008)Google Scholar
  6. 6.
    Filho, Y.V.S.: Average case analysis of region search in balanced k-d trees. Inf. Process. Lett. 8(5) (1979)Google Scholar
  7. 7.
    Finkel, R.A., Bentley, J.L.: Quad trees: A data structure for retrieval on composite keys. Acta Inf. 4 (1974)Google Scholar
  8. 8.
    Friedman, J.H., Bentley, J.L., Finkel, R.A.: An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3(3) (1977)Google Scholar
  9. 9.
    Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proc. of SIGMOD 1984 (1984)Google Scholar
  10. 10.
    Hjaltason, G.R., Samet, H.: Distance browsing in spatial databases. ACM Trans. Database Syst. 24(2) (1999)Google Scholar
  11. 11.
    Levandoski, J.J., Sarwat, M., Eldawy, A., Mokbel, M.F.: Lars: a location-aware recommender system. In: Proc. of ICDE 2012 (2012)Google Scholar
  12. 12.
    Li, G., Feng, J., Xu, J.: Desks: direction-aware spatial keyword search. In: Proc. of ICDE 2012 (2012)Google Scholar
  13. 13.
    Nemhauser, G.L., Wolsey, L.A., Fisher, M.L.: An analysis of approximations for maximizing submodular set functionsi. Mathematical Programming 14(1) (1978)Google Scholar
  14. 14.
    Papadias, D., Shen, Q., Tao, Y., Mouratidis, K.: Group nearest neighbor queries. In: Proc. of ICDE 2004 (2004)Google Scholar
  15. 15.
    Park, E., Mount, D.M.: A self-adjusting data structure for multidimensional point sets. In: Epstein, L., Ferragina, P. (eds.) ESA 2012. LNCS, vol. 7501, pp. 778–789. Springer, Heidelberg (2012) Google Scholar
  16. 16.
    Samet, H.: Foundations of multidimensional and metric data structures. Morgan Kaufmann (2006)Google Scholar
  17. 17.
    Sheng, C., Tao, Y.: Fifo indexes for decomposable problems. In: Proc. of PODS 2011 (2011)Google Scholar
  18. 18.
    Tzoumas, K., Yiu, M.L., Jensen, C.S.: Workload-aware indexing of continuously moving objects. PVLDB 2(1) (2009)Google Scholar
  19. 19.
    Yuan, J., Zheng, Y., Zhang, C., Xie, W., Xie, X., Sun, G., Huang, Y.: T-drive: driving directions based on taxi trajectories. In: Proc. of GIS 2010 (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.The Chinese University of Hong KongHong KongChina

Personalised recommendations