Advertisement

Distributed and Parallel Databases

, Volume 30, Issue 3–4, pp 239–271 | Cite as

Distributed top-k query processing by exploiting skyline summaries

  • Akrivi Vlachou
  • Christos Doulkeridis
  • Kjetil Nørvåg
Article

Abstract

Recently, a trend has been observed towards supporting rank-aware query operators, such as top-k, that enable users to retrieve only a limited set of the most interesting data objects. As data nowadays is commonly stored distributed over multiple servers, a challenging problem is to support rank-aware queries in distributed environments. In this paper, we propose a novel approach, called DiTo, for efficient top-k processing over multiple servers, where each server stores autonomously a fraction of the data. Towards this goal, we exploit the inherent relationship of top-k and skyline objects, and we employ the skyline objects of servers as a data summarization mechanism for efficiently identifying the servers that store top-k results. Relying on a thresholding scheme, DiTo retrieves the top-k result set progressively, while the number of queried servers and transferred data is minimized. Furthermore, we extend DiTo to support data summarizations of bounded size, thus restricting the cost of summary distribution and maintenance. To this end, we study the challenging problem of finding an abstraction of the skyline set of fixed size that influences the performance of DiTo only slightly. Our experimental evaluation shows that DiTo performs efficiently and provides a viable solution when a high degree of distribution is required.

Keywords

Top-k queries Skyline operator Distributed databases 

References

  1. 1.
    Akbarinia, R., Pacitti, E., Valduriez, P.: Reducing network traffic in unstructured P2P systems using top-k queries. Distrib. Parallel Databases 19(2–3), 67–86 (2006) CrossRefGoogle Scholar
  2. 2.
    Akbarinia, R., Pacitti, E., Valduriez, P.: Best position algorithms for top-k queries. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 495–506 (2007) Google Scholar
  3. 3.
    Balke, W.T., Güntzer, U.: Multi-objective query processing for database systems. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 936–947 (2004) Google Scholar
  4. 4.
    Balke, W.T., Nejdl, W., Siberski, W., Thaden, U.: Progressive distributed top-k retrieval in peer-to-peer networks. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 174–185 (2005) Google Scholar
  5. 5.
    Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 421–430 (2001) Google Scholar
  6. 6.
    Bruno, N., Chaudhuri, S., Gravano, L.: Top-k selection queries over relational databases: Mapping strategies and performance evaluation. ACM Trans. Database Syst. 27(2), 153–187 (2002) CrossRefGoogle Scholar
  7. 7.
    Cao, P., Wang, Z.: Efficient top-k query calculation in distributed networks. In: Proceedings of Annual ACM Symposium on Principles of Distributed Computing (PODC), pp. 206–215 (2004) Google Scholar
  8. 8.
    Chang, Y.-C., Bergman, L.D., Castelli, V., Li, C.-S., Lo, M.-L., Smith, J.R.: The onion technique: indexing for linear optimization queries. In: Proceedings of ACM International Conference on Management of Data (SIGMOD), pp. 391–402 (2000) CrossRefGoogle Scholar
  9. 9.
    Chaudhuri, S., Gravano, L.: Evaluating top-k selection queries. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 397–410 (1999) Google Scholar
  10. 10.
    Chaudhuri, S., Gravano, L., Marian, A.: Optimizing top-k selection queries over multimedia repositories. IEEE Trans. Knowl. Data Eng. 16(8), 992–1009 (2004) CrossRefGoogle Scholar
  11. 11.
    Chaudhuri, S., Dalvi, N.N., Kaushik, R.: Robust cardinality and cost estimation for skyline operator. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), p. 64 (2006) Google Scholar
  12. 12.
    Chen, C.M., Ling, Y.: A sampling-based estimator for top-k selection query. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 617–627 (2002) Google Scholar
  13. 13.
    Chomicki, J., Godfrey, P., Gryz, J., Liang, D.: Skyline with presorting. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 717–816 (2003) Google Scholar
  14. 14.
    Dedzoe, W.K., Lamarre, P., Akbarinia, R., Valduriez, P.: ASAP top-k query processing in unstructured P2P systems. In: Proceedings of International Conference on Peer-to-Peer Computing (P2P), pp. 1–10 (2010) CrossRefGoogle Scholar
  15. 15.
    Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proceedings of Symposium on Principles of Database Systems (PODS), pp. 102–113 (2001) Google Scholar
  16. 16.
    Gonzalez, T.F.: Clustering to minimize the maximum intercluster distance. Theor. Comput. Sci. 38, 293–306 (1985) MATHCrossRefGoogle Scholar
  17. 17.
    Güntzer, U., Balke, W.T., Kießling, W.: Optimizing multi-feature queries for image databases. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 419–428 (2000) Google Scholar
  18. 18.
    Hose, K., Karnstedt, M., Sattler, K.U., Zinn, D.: Processing top-N queries in P2P-based web integration systems with probabilistic guarantees. In: Proceedings of International Workshop on Web and Databases (WebDB), pp. 109–114 (2005) Google Scholar
  19. 19.
    Hristidis, V., Koudas, N., Papakonstantinou, Y.: PREFER: A system for the efficient execution of multi-parametric ranked queries. In: Proceedings of ACM International Conference on Management of Data (SIGMOD), pp. 259–270 (2001) Google Scholar
  20. 20.
    Ilyas, I.F., Aref, W.G., Elmagarmid, A.K., Elmongui, H.G., Shah, R., Vitter, J.S.: Adaptive rank-aware query optimization in relational databases. ACM Trans. Database Syst. 31(4), 1257–1304 (2006) CrossRefGoogle Scholar
  21. 21.
    Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40, 4 (2008) CrossRefGoogle Scholar
  22. 22.
    Lu, J., Callan, J.: Merging retrieval results in hierarchical peer-to-peer networks. In: Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 472–473 (2004) Google Scholar
  23. 23.
    Lu, J., Callan, J.: Federated search of text-based digital libraries in hierarchical peer-to-peer networks. In: Proceedings of European Conference on IR Research (ECIR), pp. 52–66 (2005) Google Scholar
  24. 24.
    Marian, A., Bruno, N., Gravano, L.: Evaluating top-k queries over web-accessible databases. ACM Trans. Database Syst. 29(2), 319–362 (2004) CrossRefGoogle Scholar
  25. 25.
    Michel, S., Triantafillou, P., Weikum, G.: KLEE: a framework for distributed top-k query algorithms. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 637–648 (2005) Google Scholar
  26. 26.
    Mouratidis, K., Bakiras, S., Papadias, D.: Continuous monitoring of top-k queries over sliding windows. In: Proceedings of ACM International Conference on Management of Data (SIGMOD), pp. 635–646 (2006) CrossRefGoogle Scholar
  27. 27.
    Ryeng, N.H., Vlachou, A., Doulkeridis, C., Nørvåg, K.: Efficient distributed top-k query processing with caching. In: Proceedings of DASFAA, vol. 2, pp. 280–295 (2011) Google Scholar
  28. 28.
    Tao, Y., Hristidis, V., Papadias, D., Papakonstantinou, Y.: Branch-and-bound processing of ranked queries. Inf. Sci. 32(3), 424–445 (2007) Google Scholar
  29. 29.
    Tsaparas, P., Palpanas, T., Kotidis, Y., Koudas, N., Srivastava, D.: Ranked join indices. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 277–288 (2003) Google Scholar
  30. 30.
    Vlachou, A., Doulkeridis, C., Nørvåg, K., Vazirgiannis, M.: On efficient top-k query processing in highly distributed environments. In: Proceedings of ACM International Conference on Management of Data (SIGMOD), pp. 753–764 (2008) Google Scholar
  31. 31.
    Vlachou, A., Doulkeridis, C., Nørvåg, K., Vazirgiannis, M.: Skyline-based peer-to-peer top-k query processing. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 1421–1423 (2008) Google Scholar
  32. 32.
    Vlachou, A., Doulkeridis, C., Kotidis, Y., Nørvåg, K.: Reverse top-k queries. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 365–376 (2010) CrossRefGoogle Scholar
  33. 33.
    Vlachou, A., Doulkeridis, C., Nørvåg, K., Kotidis, Y.: Identifying the most influential data objects with reverse top-k queries. Proc. VLDB Endow. 3(1), 364–372 (2010) Google Scholar
  34. 34.
    Vlachou, A., Doulkeridis, C., Kotidis, Y., Nørvåg, K.: Monochromatic and bichromatic reverse top-k queries. IEEE Trans. Knowl. Data Eng. 23(8), 1215–1229 (2011) CrossRefGoogle Scholar
  35. 35.
    Vlachou, A., Doulkeridis, C., Nørvåg, K.: Monitoring reverse top-k queries over mobile devices. In: Proceedings of ACM Workshop on Data Engineering for Wireless and Mobile Access (MobiDE) (2011) Google Scholar
  36. 36.
    Zhao, K., Tao, Y., Zhou, S.: Efficient top-k processing in large-scaled distributed environments. Data Knowl. Eng. 63(2), 315–335 (2007) CrossRefGoogle Scholar
  37. 37.
    Zou, L., Chen, L.: Pareto-based dominant graph: An efficient indexing structure to answer top-k queries. IEEE Trans. Knowl. Data Eng. 23(5), 727–741 (2011) CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Akrivi Vlachou
    • 1
  • Christos Doulkeridis
    • 1
  • Kjetil Nørvåg
    • 1
  1. 1.Dept. of Computer ScienceNTNUTrondheimNorway

Personalised recommendations