Skip to main content

Fast Distributed Top-q and Top-k Query Processing

  • Chapter
  • First Online:
Book cover Transactions on Large-Scale Data- and Knowledge-Centered Systems XLI

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 11390))

  • 421 Accesses

Abstract

Top-k queries retrieve the k results of a query which score best for an objective function representing the preferences of users. To require that the returned results also have to satisfy the preferences to a certain degree we introduce top-q queries which return all results which approximate the user preferences to at least some minim degree q. We show how top-q queries and top-k queries can be combined enabling the user to post a large number of interesting queries. Furthermore, we show that the calculation of top-q queries can be integrated in algorithms efficiently processing top-k queries. We implemented our approach and evaluated it against the fastest threshold based top-k query answering approaches (BPA-2). Our experiments showed an improvement by one to two orders of magnitude regarding time and memory requirements. Furthermore, we show how such queries can be processed in highly distributed peer-to-peer databases in an efficient way and propose an adaptive algorithm which takes several parameters of the network of databases into account to optimize the processing of distributed top-k queries.

The work reported here was supported by the Austrian Ministry for Science and Research within the projects GATIB II and BBMRI.AT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. UCI Machine Learning Repository, US Census Data 1990 (2012). http://archive.ics.uci.edu/ml/datasets/US+Census+Data+(1990)

  2. Agrawal, S., Chaudhuri, S.: Automated ranking of database query results. In: CIDR, pp. 888–899 (2003)

    Google Scholar 

  3. Akbarinia, R., Pacitti, E., Valduriez, P.: Reducing network traffic in unstructured P2P systems using top-k queries. Distrib. Parallel Databases 19, 67–86 (2006)

    Article  Google Scholar 

  4. Akbarinia, R., Pacitti, E., Valduriez, P.: Best position algorithms for top-k queries. In: Proceedings of the 33rd Internatinal Conference on Very Large Databases, pp. 495–506. VLDB Endowment (2007)

    Google Scholar 

  5. Asslaber, M., Abuja, P., et al.: The Genome Austria Tissue Bank (GATIB). Pathobiology 74, 251–258 (2007)

    Article  Google Scholar 

  6. Balke, W.-T., Nejdl, W., Siberski, W., Thaden, U.: Progressive distributed top-k retrieval in peer-to-peer networks. In: Proceedings of the 21st International Conference on Data Engineering, ICDE 2005, pp. 174–185. IEEE Computer Society (2005)

    Google Scholar 

  7. Church, K., Gale, W.: Inverse document frequency (IDF): a measure of deviations from Poisson. In: Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds.) Natural Language Processing Using Very Large Corpora. Text, Speech and Language Technology, vol. 11, pp. 283–295. Springer, Dordrecht (1999). https://doi.org/10.1007/978-94-017-2390-9_18

    Chapter  Google Scholar 

  8. Ciglic, M., Eder, J., Koncilia, C.: Anonymization of data sets with NULL values. In: Hameurlain, A., Küng, J., Wagner, R., Decker, H., Lhotska, L., Link, S. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIV. LNCS, vol. 9510, pp. 193–220. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49214-7_7

    Chapter  Google Scholar 

  9. Conner, W., Hwang, S.-W., Nahrstedt, K.: Unified framework for top-k query processing in peer-to-peer networks. Technical report, University of Illinois (2007)

    Google Scholar 

  10. Dabringer, C.: Efficient local and distributed query processing in a biomedical environment. Ph.D. thesis, Alpen Adria Universität Klagenfurt (2012)

    Google Scholar 

  11. Dabringer, C., Eder, J.: Efficient top-k retrieval for user preference queries. In: Proceedings of the 26th ACM Symposium on Applied Computing (2011)

    Google Scholar 

  12. Dabringer, C., Eder, J.: Fast top-k query answering. In: Proceedings of the 22th International Conference on Database and Expert Systems Applications (2011)

    Google Scholar 

  13. Dabringer, C., Eder, J.: Towards adaptive distributed top-k query processing. In: Ivanović, M., et al. (eds.) ADBIS 2016. CCIS, vol. 637, pp. 37–44. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44066-8_4

    Chapter  Google Scholar 

  14. Dabringer, C., Eder, J.: Fast top-Q and top-K query answering. In: Dang, T.K., Wagner, R., Küng, J., Thoai, N., Takizawa, M., Neuhold, E.J. (eds.) FDSE 2017. LNCS, vol. 10646, pp. 43–63. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70004-5_3

    Chapter  Google Scholar 

  15. Eder, J., Dabringer, C., Schicho, M., Stark, K.: Information systems for federated biobanks. In: Hameurlain, A., Küng, J., Wagner, R. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems I. LNCS, vol. 5740, pp. 156–190. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03722-1_7

    Chapter  Google Scholar 

  16. Eder, J., Frank, H., Liebhart, W.: Optimization of object-oriented queries by inverse methods. In: Eder, J., Kalinichenko, L.A. (eds.) East/West Database Workshop. Springer, LondonI (1995). https://doi.org/10.1007/978-1-4471-3577-7_8

    Chapter  Google Scholar 

  17. Eder, J., Gottweis, H., Zatloukal, K.: IT solutions for privacy protection in biobanking. Publ. Health Genom. 15(5), 254–262 (2012)

    Article  Google Scholar 

  18. Eder, J., Koncilia, C., Morzy, T.: A model for a temporal data warehouse. In: Open Enterprise Solutions: Systems, Experiences and Organizations (OES-SEO 2001). Luiss Edizioni (2001)

    Google Scholar 

  19. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proceedings of the 2001 ACM Symposium on Principles of Database Systems, pp. 102–113. ACM, New York (2001)

    Google Scholar 

  20. Fang, Q., Yang, G.: Efficient top-k query processing algorithms in highly distributed environments. J. Comput. 9(9), 2000–2006 (2014)

    Article  Google Scholar 

  21. Fang, Q., Zhao, Y., Yang, G., Wang, B., Zheng, W.: Best position algorithms for top-k query processing in highly distributed environments. In: Proceedings of the 2010 First International Conference on Networking and Distributed Computing, ICNDC 2010, pp. 397–401. IEEE Computer Society, Washington, DC (2010)

    Google Scholar 

  22. Feuerstein, S., Pribyl, B.: Oracle PL/SQL Programming, 5th edn. Paperback, Sebastopol (2009)

    MATH  Google Scholar 

  23. Frank, A., Asuncion, A.: UCI Machine Learning Repository (2010)

    Google Scholar 

  24. Guntzer, U., Balke, W.-T., Kiessling, W.: Optimizing multi-feature queries for image databases. In: Proceedings of the 26th International Conference on Very Large Databases, pp. 419–428. Morgan Kaufmann Publishers Inc., San Francisco (2000)

    Google Scholar 

  25. Guntzer, U., Balke, W.-T., Kiessling, W.: Towards efficient multi-feature queries in heterogeneous environments. In: Proceedings of the IEEE International Conference on IT: Coding and Computing, pp. 622–628 (2001)

    Google Scholar 

  26. Hagihara, R., Shinohara, M., Hara, T., Nishio, S.: A message processing method for top-k query for traffic reduction in ad hoc networks. In: Proceedings of the Tenth Interenational Conference on Mobile Data Management, MDM 2009, pp. 11–20. IEEE Computer Society (2009)

    Google Scholar 

  27. Hofer-Picout, P., et al.: Conception and implementation of an Austrian biobank directory integration framework. Biopreservation Biobanking 15(4), 332–340 (2017)

    Article  Google Scholar 

  28. Hristidis, V., Hu, Y., Ipeirotis, P.G.: Ranked queries over sources with Boolean query interfaces without ranking support. In: 26th IEEE International Conference on Data Engineering (2010)

    Google Scholar 

  29. Hua, M., Pei, J., Fu, A.W.C., Lin, X., Leung, H.-F.: Efficiently answering top-k typicality queries on large databases. In: Proceedings of the 33rd Interenational Conference on Very Large Databases, pp. 890–901. VLDB Endowment (2007)

    Google Scholar 

  30. Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4), 1–58 (2008)

    Article  Google Scholar 

  31. Levandoski, J.J., Mokbel, M.F., Khalefa, M.E., Korukanti, V.R.: FlexPref: a framework for extensible preference evaluation in database systems. In: ICDE, New York, NY, USA (2010)

    Google Scholar 

  32. Litton, J.-E.: Launch of an infrastructure for health research: BBMRI-ERIC. Biopreservation Biobanking 16, 233–241 (2018)

    Article  Google Scholar 

  33. Mamoulis, N., Yiu, M.L., Cheng, K.H., Cheung, D.W.: Efficient top-k aggregation of ranked inputs. ACM Trans. Database Syst. 32(3), 19 (2007)

    Article  Google Scholar 

  34. Marian, A., Bruno, N., Gravano, L.: Evaluating top-k queries over web-accessible databases. ACM Trans. Database Syst. 29(2), 319–362 (2004)

    Article  Google Scholar 

  35. Nepal, S., Ramakrishna, M.: Query processing issues in image (multimedia) databases. In: ICDE, pp. 22–29 (1999)

    Google Scholar 

  36. Owens, K.T.: Building Intelligent Databases with Oracle PL/SQL, Triggers, and Stored Procedures, 2nd edn. Prentice-Hall Inc., Upper Saddle River (1998)

    Google Scholar 

  37. Robertson, S.: Understanding inverse document frequency: on theoretical arguments for idf. J. Doc. 60, 503–520 (2004)

    Article  Google Scholar 

  38. Ryeng, N.H., Vlachou, A., Doulkeridis, C., Nørvåg, K.: Efficient distributed top-k query processing with caching. In: Yu, J.X., Kim, M.H., Unland, R. (eds.) DASFAA 2011. LNCS, vol. 6588, pp. 280–295. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20152-3_21

    Chapter  Google Scholar 

  39. Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. In: Willett, P. (ed.) Document Retrieval Systems, pp. 132–142. Taylor Graham Publishing, London (1988). http://dl.acm.org/citation.cfm?id=106765.106782. ISBN 0-947568-21-2

    Google Scholar 

  40. Vlachou, A., Doulkeridis, C., Nørvåg, K., Vazirgiannis, M.: On efficient top-k query processing in highly distributed environments. In: Procedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 753–764. ACM (2008)

    Google Scholar 

  41. Wichmann, H.-E., Kuhn, K., et al.: Comprehensive catalog of European biobanks. Nat. Biotechnol. 29(9), 795–797 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Johann Eder .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer-Verlag GmbH Germany, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Dabringer, C., Eder, J. (2019). Fast Distributed Top-q and Top-k Query Processing. In: Hameurlain, A., Wagner, R., Dang, T. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XLI. Lecture Notes in Computer Science(), vol 11390. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-58808-6_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-58808-6_1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-58807-9

  • Online ISBN: 978-3-662-58808-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics