Fast Distributed Top-q and Top-k Query Processing

Dabringer, Claus; Eder, Johann

doi:10.1007/978-3-662-58808-6_1

Claus Dabringer¹⁵ &
Johann Eder¹⁵

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 11390))

421 Accesses

Abstract

Top-k queries retrieve the k results of a query which score best for an objective function representing the preferences of users. To require that the returned results also have to satisfy the preferences to a certain degree we introduce top-q queries which return all results which approximate the user preferences to at least some minim degree q. We show how top-q queries and top-k queries can be combined enabling the user to post a large number of interesting queries. Furthermore, we show that the calculation of top-q queries can be integrated in algorithms efficiently processing top-k queries. We implemented our approach and evaluated it against the fastest threshold based top-k query answering approaches (BPA-2). Our experiments showed an improvement by one to two orders of magnitude regarding time and memory requirements. Furthermore, we show how such queries can be processed in highly distributed peer-to-peer databases in an efficient way and propose an adaptive algorithm which takes several parameters of the network of databases into account to optimize the processing of distributed top-k queries.

The work reported here was supported by the Austrian Ministry for Science and Research within the projects GATIB II and BBMRI.AT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

UCI Machine Learning Repository, US Census Data 1990 (2012). http://archive.ics.uci.edu/ml/datasets/US+Census+Data+(1990)
Agrawal, S., Chaudhuri, S.: Automated ranking of database query results. In: CIDR, pp. 888–899 (2003)
Google Scholar
Akbarinia, R., Pacitti, E., Valduriez, P.: Reducing network traffic in unstructured P2P systems using top-k queries. Distrib. Parallel Databases 19, 67–86 (2006)
Article Google Scholar
Akbarinia, R., Pacitti, E., Valduriez, P.: Best position algorithms for top-k queries. In: Proceedings of the 33rd Internatinal Conference on Very Large Databases, pp. 495–506. VLDB Endowment (2007)
Google Scholar
Asslaber, M., Abuja, P., et al.: The Genome Austria Tissue Bank (GATIB). Pathobiology 74, 251–258 (2007)
Article Google Scholar
Balke, W.-T., Nejdl, W., Siberski, W., Thaden, U.: Progressive distributed top-k retrieval in peer-to-peer networks. In: Proceedings of the 21st International Conference on Data Engineering, ICDE 2005, pp. 174–185. IEEE Computer Society (2005)
Google Scholar
Church, K., Gale, W.: Inverse document frequency (IDF): a measure of deviations from Poisson. In: Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds.) Natural Language Processing Using Very Large Corpora. Text, Speech and Language Technology, vol. 11, pp. 283–295. Springer, Dordrecht (1999). https://doi.org/10.1007/978-94-017-2390-9_18
Chapter Google Scholar
Ciglic, M., Eder, J., Koncilia, C.: Anonymization of data sets with NULL values. In: Hameurlain, A., Küng, J., Wagner, R., Decker, H., Lhotska, L., Link, S. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIV. LNCS, vol. 9510, pp. 193–220. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49214-7_7
Chapter Google Scholar
Conner, W., Hwang, S.-W., Nahrstedt, K.: Unified framework for top-k query processing in peer-to-peer networks. Technical report, University of Illinois (2007)
Google Scholar
Dabringer, C.: Efficient local and distributed query processing in a biomedical environment. Ph.D. thesis, Alpen Adria Universität Klagenfurt (2012)
Google Scholar
Dabringer, C., Eder, J.: Efficient top-k retrieval for user preference queries. In: Proceedings of the 26th ACM Symposium on Applied Computing (2011)
Google Scholar
Dabringer, C., Eder, J.: Fast top-k query answering. In: Proceedings of the 22th International Conference on Database and Expert Systems Applications (2011)
Google Scholar
Dabringer, C., Eder, J.: Towards adaptive distributed top-k query processing. In: Ivanović, M., et al. (eds.) ADBIS 2016. CCIS, vol. 637, pp. 37–44. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44066-8_4
Chapter Google Scholar
Dabringer, C., Eder, J.: Fast top-Q and top-K query answering. In: Dang, T.K., Wagner, R., Küng, J., Thoai, N., Takizawa, M., Neuhold, E.J. (eds.) FDSE 2017. LNCS, vol. 10646, pp. 43–63. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70004-5_3
Chapter Google Scholar
Eder, J., Dabringer, C., Schicho, M., Stark, K.: Information systems for federated biobanks. In: Hameurlain, A., Küng, J., Wagner, R. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems I. LNCS, vol. 5740, pp. 156–190. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03722-1_7
Chapter Google Scholar
Eder, J., Frank, H., Liebhart, W.: Optimization of object-oriented queries by inverse methods. In: Eder, J., Kalinichenko, L.A. (eds.) East/West Database Workshop. Springer, LondonI (1995). https://doi.org/10.1007/978-1-4471-3577-7_8
Chapter Google Scholar
Eder, J., Gottweis, H., Zatloukal, K.: IT solutions for privacy protection in biobanking. Publ. Health Genom. 15(5), 254–262 (2012)
Article Google Scholar
Eder, J., Koncilia, C., Morzy, T.: A model for a temporal data warehouse. In: Open Enterprise Solutions: Systems, Experiences and Organizations (OES-SEO 2001). Luiss Edizioni (2001)
Google Scholar
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proceedings of the 2001 ACM Symposium on Principles of Database Systems, pp. 102–113. ACM, New York (2001)
Google Scholar
Fang, Q., Yang, G.: Efficient top-k query processing algorithms in highly distributed environments. J. Comput. 9(9), 2000–2006 (2014)
Article Google Scholar
Fang, Q., Zhao, Y., Yang, G., Wang, B., Zheng, W.: Best position algorithms for top-k query processing in highly distributed environments. In: Proceedings of the 2010 First International Conference on Networking and Distributed Computing, ICNDC 2010, pp. 397–401. IEEE Computer Society, Washington, DC (2010)
Google Scholar
Feuerstein, S., Pribyl, B.: Oracle PL/SQL Programming, 5th edn. Paperback, Sebastopol (2009)
MATH Google Scholar
Frank, A., Asuncion, A.: UCI Machine Learning Repository (2010)
Google Scholar
Guntzer, U., Balke, W.-T., Kiessling, W.: Optimizing multi-feature queries for image databases. In: Proceedings of the 26th International Conference on Very Large Databases, pp. 419–428. Morgan Kaufmann Publishers Inc., San Francisco (2000)
Google Scholar
Guntzer, U., Balke, W.-T., Kiessling, W.: Towards efficient multi-feature queries in heterogeneous environments. In: Proceedings of the IEEE International Conference on IT: Coding and Computing, pp. 622–628 (2001)
Google Scholar
Hagihara, R., Shinohara, M., Hara, T., Nishio, S.: A message processing method for top-k query for traffic reduction in ad hoc networks. In: Proceedings of the Tenth Interenational Conference on Mobile Data Management, MDM 2009, pp. 11–20. IEEE Computer Society (2009)
Google Scholar
Hofer-Picout, P., et al.: Conception and implementation of an Austrian biobank directory integration framework. Biopreservation Biobanking 15(4), 332–340 (2017)
Article Google Scholar
Hristidis, V., Hu, Y., Ipeirotis, P.G.: Ranked queries over sources with Boolean query interfaces without ranking support. In: 26th IEEE International Conference on Data Engineering (2010)
Google Scholar
Hua, M., Pei, J., Fu, A.W.C., Lin, X., Leung, H.-F.: Efficiently answering top-k typicality queries on large databases. In: Proceedings of the 33rd Interenational Conference on Very Large Databases, pp. 890–901. VLDB Endowment (2007)
Google Scholar
Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4), 1–58 (2008)
Article Google Scholar
Levandoski, J.J., Mokbel, M.F., Khalefa, M.E., Korukanti, V.R.: FlexPref: a framework for extensible preference evaluation in database systems. In: ICDE, New York, NY, USA (2010)
Google Scholar
Litton, J.-E.: Launch of an infrastructure for health research: BBMRI-ERIC. Biopreservation Biobanking 16, 233–241 (2018)
Article Google Scholar
Mamoulis, N., Yiu, M.L., Cheng, K.H., Cheung, D.W.: Efficient top-k aggregation of ranked inputs. ACM Trans. Database Syst. 32(3), 19 (2007)
Article Google Scholar
Marian, A., Bruno, N., Gravano, L.: Evaluating top-k queries over web-accessible databases. ACM Trans. Database Syst. 29(2), 319–362 (2004)
Article Google Scholar
Nepal, S., Ramakrishna, M.: Query processing issues in image (multimedia) databases. In: ICDE, pp. 22–29 (1999)
Google Scholar
Owens, K.T.: Building Intelligent Databases with Oracle PL/SQL, Triggers, and Stored Procedures, 2nd edn. Prentice-Hall Inc., Upper Saddle River (1998)
Google Scholar
Robertson, S.: Understanding inverse document frequency: on theoretical arguments for idf. J. Doc. 60, 503–520 (2004)
Article Google Scholar
Ryeng, N.H., Vlachou, A., Doulkeridis, C., Nørvåg, K.: Efficient distributed top-k query processing with caching. In: Yu, J.X., Kim, M.H., Unland, R. (eds.) DASFAA 2011. LNCS, vol. 6588, pp. 280–295. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20152-3_21
Chapter Google Scholar
Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. In: Willett, P. (ed.) Document Retrieval Systems, pp. 132–142. Taylor Graham Publishing, London (1988). http://dl.acm.org/citation.cfm?id=106765.106782. ISBN 0-947568-21-2
Google Scholar
Vlachou, A., Doulkeridis, C., Nørvåg, K., Vazirgiannis, M.: On efficient top-k query processing in highly distributed environments. In: Procedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 753–764. ACM (2008)
Google Scholar
Wichmann, H.-E., Kuhn, K., et al.: Comprehensive catalog of European biobanks. Nat. Biotechnol. 29(9), 795–797 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics-Systems, Alpen-Adria Universität, Klagenfurt, Austria
Claus Dabringer & Johann Eder

Authors

Claus Dabringer
View author publications
You can also search for this author in PubMed Google Scholar
Johann Eder
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Johann Eder .

Editor information

Editors and Affiliations

IRIT, Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
FAW, University of Linz, Linz, Austria
Roland Wagner
Ho Chi Minh City University of Technology, Ho Chi Minh City, Vietnam
Tran Khanh Dang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dabringer, C., Eder, J. (2019). Fast Distributed Top-q and Top-k Query Processing. In: Hameurlain, A., Wagner, R., Dang, T. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XLI. Lecture Notes in Computer Science(), vol 11390. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-58808-6_1

Download citation

DOI: https://doi.org/10.1007/978-3-662-58808-6_1
Published: 07 February 2019
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-58807-9
Online ISBN: 978-3-662-58808-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics