Abstract
Top-k queries retrieve the k results of a query which score best for an objective function representing the preferences of users. To require that the returned results also have to satisfy the preferences to a certain degree we introduce top-q queries which return all results which approximate the user preferences to at least some minim degree q. We show how top-q queries and top-k queries can be combined enabling the user to post a large number of interesting queries. Furthermore, we show that the calculation of top-q queries can be integrated in algorithms efficiently processing top-k queries. We implemented our approach and evaluated it against the fastest threshold based top-k query answering approaches (BPA-2). Our experiments showed an improvement by one to two orders of magnitude regarding time and memory requirements. Furthermore, we show how such queries can be processed in highly distributed peer-to-peer databases in an efficient way and propose an adaptive algorithm which takes several parameters of the network of databases into account to optimize the processing of distributed top-k queries.
The work reported here was supported by the Austrian Ministry for Science and Research within the projects GATIB II and BBMRI.AT.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
UCI Machine Learning Repository, US Census Data 1990 (2012). http://archive.ics.uci.edu/ml/datasets/US+Census+Data+(1990)
Agrawal, S., Chaudhuri, S.: Automated ranking of database query results. In: CIDR, pp. 888–899 (2003)
Akbarinia, R., Pacitti, E., Valduriez, P.: Reducing network traffic in unstructured P2P systems using top-k queries. Distrib. Parallel Databases 19, 67–86 (2006)
Akbarinia, R., Pacitti, E., Valduriez, P.: Best position algorithms for top-k queries. In: Proceedings of the 33rd Internatinal Conference on Very Large Databases, pp. 495–506. VLDB Endowment (2007)
Asslaber, M., Abuja, P., et al.: The Genome Austria Tissue Bank (GATIB). Pathobiology 74, 251–258 (2007)
Balke, W.-T., Nejdl, W., Siberski, W., Thaden, U.: Progressive distributed top-k retrieval in peer-to-peer networks. In: Proceedings of the 21st International Conference on Data Engineering, ICDE 2005, pp. 174–185. IEEE Computer Society (2005)
Church, K., Gale, W.: Inverse document frequency (IDF): a measure of deviations from Poisson. In: Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds.) Natural Language Processing Using Very Large Corpora. Text, Speech and Language Technology, vol. 11, pp. 283–295. Springer, Dordrecht (1999). https://doi.org/10.1007/978-94-017-2390-9_18
Ciglic, M., Eder, J., Koncilia, C.: Anonymization of data sets with NULL values. In: Hameurlain, A., Küng, J., Wagner, R., Decker, H., Lhotska, L., Link, S. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIV. LNCS, vol. 9510, pp. 193–220. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49214-7_7
Conner, W., Hwang, S.-W., Nahrstedt, K.: Unified framework for top-k query processing in peer-to-peer networks. Technical report, University of Illinois (2007)
Dabringer, C.: Efficient local and distributed query processing in a biomedical environment. Ph.D. thesis, Alpen Adria Universität Klagenfurt (2012)
Dabringer, C., Eder, J.: Efficient top-k retrieval for user preference queries. In: Proceedings of the 26th ACM Symposium on Applied Computing (2011)
Dabringer, C., Eder, J.: Fast top-k query answering. In: Proceedings of the 22th International Conference on Database and Expert Systems Applications (2011)
Dabringer, C., Eder, J.: Towards adaptive distributed top-k query processing. In: Ivanović, M., et al. (eds.) ADBIS 2016. CCIS, vol. 637, pp. 37–44. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44066-8_4
Dabringer, C., Eder, J.: Fast top-Q and top-K query answering. In: Dang, T.K., Wagner, R., Küng, J., Thoai, N., Takizawa, M., Neuhold, E.J. (eds.) FDSE 2017. LNCS, vol. 10646, pp. 43–63. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70004-5_3
Eder, J., Dabringer, C., Schicho, M., Stark, K.: Information systems for federated biobanks. In: Hameurlain, A., Küng, J., Wagner, R. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems I. LNCS, vol. 5740, pp. 156–190. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03722-1_7
Eder, J., Frank, H., Liebhart, W.: Optimization of object-oriented queries by inverse methods. In: Eder, J., Kalinichenko, L.A. (eds.) East/West Database Workshop. Springer, LondonI (1995). https://doi.org/10.1007/978-1-4471-3577-7_8
Eder, J., Gottweis, H., Zatloukal, K.: IT solutions for privacy protection in biobanking. Publ. Health Genom. 15(5), 254–262 (2012)
Eder, J., Koncilia, C., Morzy, T.: A model for a temporal data warehouse. In: Open Enterprise Solutions: Systems, Experiences and Organizations (OES-SEO 2001). Luiss Edizioni (2001)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proceedings of the 2001 ACM Symposium on Principles of Database Systems, pp. 102–113. ACM, New York (2001)
Fang, Q., Yang, G.: Efficient top-k query processing algorithms in highly distributed environments. J. Comput. 9(9), 2000–2006 (2014)
Fang, Q., Zhao, Y., Yang, G., Wang, B., Zheng, W.: Best position algorithms for top-k query processing in highly distributed environments. In: Proceedings of the 2010 First International Conference on Networking and Distributed Computing, ICNDC 2010, pp. 397–401. IEEE Computer Society, Washington, DC (2010)
Feuerstein, S., Pribyl, B.: Oracle PL/SQL Programming, 5th edn. Paperback, Sebastopol (2009)
Frank, A., Asuncion, A.: UCI Machine Learning Repository (2010)
Guntzer, U., Balke, W.-T., Kiessling, W.: Optimizing multi-feature queries for image databases. In: Proceedings of the 26th International Conference on Very Large Databases, pp. 419–428. Morgan Kaufmann Publishers Inc., San Francisco (2000)
Guntzer, U., Balke, W.-T., Kiessling, W.: Towards efficient multi-feature queries in heterogeneous environments. In: Proceedings of the IEEE International Conference on IT: Coding and Computing, pp. 622–628 (2001)
Hagihara, R., Shinohara, M., Hara, T., Nishio, S.: A message processing method for top-k query for traffic reduction in ad hoc networks. In: Proceedings of the Tenth Interenational Conference on Mobile Data Management, MDM 2009, pp. 11–20. IEEE Computer Society (2009)
Hofer-Picout, P., et al.: Conception and implementation of an Austrian biobank directory integration framework. Biopreservation Biobanking 15(4), 332–340 (2017)
Hristidis, V., Hu, Y., Ipeirotis, P.G.: Ranked queries over sources with Boolean query interfaces without ranking support. In: 26th IEEE International Conference on Data Engineering (2010)
Hua, M., Pei, J., Fu, A.W.C., Lin, X., Leung, H.-F.: Efficiently answering top-k typicality queries on large databases. In: Proceedings of the 33rd Interenational Conference on Very Large Databases, pp. 890–901. VLDB Endowment (2007)
Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4), 1–58 (2008)
Levandoski, J.J., Mokbel, M.F., Khalefa, M.E., Korukanti, V.R.: FlexPref: a framework for extensible preference evaluation in database systems. In: ICDE, New York, NY, USA (2010)
Litton, J.-E.: Launch of an infrastructure for health research: BBMRI-ERIC. Biopreservation Biobanking 16, 233–241 (2018)
Mamoulis, N., Yiu, M.L., Cheng, K.H., Cheung, D.W.: Efficient top-k aggregation of ranked inputs. ACM Trans. Database Syst. 32(3), 19 (2007)
Marian, A., Bruno, N., Gravano, L.: Evaluating top-k queries over web-accessible databases. ACM Trans. Database Syst. 29(2), 319–362 (2004)
Nepal, S., Ramakrishna, M.: Query processing issues in image (multimedia) databases. In: ICDE, pp. 22–29 (1999)
Owens, K.T.: Building Intelligent Databases with Oracle PL/SQL, Triggers, and Stored Procedures, 2nd edn. Prentice-Hall Inc., Upper Saddle River (1998)
Robertson, S.: Understanding inverse document frequency: on theoretical arguments for idf. J. Doc. 60, 503–520 (2004)
Ryeng, N.H., Vlachou, A., Doulkeridis, C., Nørvåg, K.: Efficient distributed top-k query processing with caching. In: Yu, J.X., Kim, M.H., Unland, R. (eds.) DASFAA 2011. LNCS, vol. 6588, pp. 280–295. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20152-3_21
Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. In: Willett, P. (ed.) Document Retrieval Systems, pp. 132–142. Taylor Graham Publishing, London (1988). http://dl.acm.org/citation.cfm?id=106765.106782. ISBN 0-947568-21-2
Vlachou, A., Doulkeridis, C., Nørvåg, K., Vazirgiannis, M.: On efficient top-k query processing in highly distributed environments. In: Procedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 753–764. ACM (2008)
Wichmann, H.-E., Kuhn, K., et al.: Comprehensive catalog of European biobanks. Nat. Biotechnol. 29(9), 795–797 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer-Verlag GmbH Germany, part of Springer Nature
About this chapter
Cite this chapter
Dabringer, C., Eder, J. (2019). Fast Distributed Top-q and Top-k Query Processing. In: Hameurlain, A., Wagner, R., Dang, T. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XLI. Lecture Notes in Computer Science(), vol 11390. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-58808-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-662-58808-6_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-58807-9
Online ISBN: 978-3-662-58808-6
eBook Packages: Computer ScienceComputer Science (R0)