Abstract
Three join algorithms are evaluated in an environment with distributed main-memory based mediators and data sources. A streamed ship-out join ships bulks of tuples to a mediator near a data source, followed by post-processing in the client. An extended streamed semi-join in addition builds a main-memory hash index in the client mediator. A ship-in algorithm materializes and joins the data in the client mediator.
The first two algorithms are suitable for sources that require parameters to execute a query, as web search engines and computational software, and the last is suitable otherwise. We compare the execution times for obtaining all and the first N tuples, and analyze the percentage time spent in subsystems, varying the network communication speed, bulk size, and data duplicates. The join algorithm leads to orders of magnitude performance difference in different mediation environments.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
P. Apers, A. Hevner, and S. Yao: Optimization Algorithms for Distributed Queries. IEEE Transactions on Software Engineering, 9(1), 57–68, 1983
P. Bernstein and D. Chiu: Using Semi-joins to Solve Relational Queries. Journal of ACM 28(1), 25–40, 1981
W. Du and M. Shan: Query Processing in Pegasus, In O. Bukhres and A. Elmagarmid (eds.): Object-Oriented Multidatabase Systems. Pretince Hall, 449–471, 1996.
G. Fahl and T. Risch: Query Processing over Object Views of Relational Data. The VLDB Journal, Springer, 6(4), 261–281, 1997.
P. Bernstein, N. Goodman, E. Wong, C. Reeve, J. Rothnie Jr.: Query Processing in a System for Distributed Databases (SDD-1). ACM Transactions on Database Systems (TODS), 6(4), 602–625, 1981
G. Graefe and W. J. MCKenna: The Volcano Optimizer Generator: Extensibility and Efficient Search. 12th Data Engineering Conf. (ICDE’93), 209–218, 1993.
L. Haas, D. Kossmann, E.L. Wimmers, J. Yang: Optimizing Queries across Diverse Data Sources. 23th Intl. Conf. on Very Large Databases (VLDB’97), 276–285, 1997
V. Josifovski and T. Risch: Functional Query Optimization over Object-Oriented Views for Data Integration. Intelligent Information Systems (JIIS) 12(2-3), Kluwer, 165–190, 1999.
V. Josifovski and T. Risch: Integrating Heterogeneous Overlapping Databases through Object-Oriented Transformations. 25th Intl. Conf. on Very Large Databases (VLDB’99), 435–446, 1999.
V. Josifovski and T. Risch: Query Decomposition for a Distributed Object-Oriented Mediator System. To appear in J. of Distribued and Parallel Databases, Kluwer, 2001.
E-P. Lim, S-Y. Hwang, J. Srivastava, D. Clements, and M. Ganesh: Myriad: Design and Implementation of a Federated Database System. Software-Practice and Experience, Vol. 25(5), 553–562, John Wiley & Sons, May 1995.
H. Lin, T. Risch and T. Katchanounov: Adaptive data mediation over XML data. To appear in J. of Applied System Studies (JASS), Cambridge International Science Publishing, 2001.
L. Liu and Calton Pu: An Adaptive Object-Oriented Approach to Integration and Access of Heterogeneous Information Sources. Journal of Distributed and Parallel Databases 5(2), 167–205, Kluwer Academic Pulishers, The Netherlands, 1997.
G. Lohman, C. Mohan, L. Haas, D. Daniels, B. Lindsay, P. Selinger and P. Wilms: Query Processing in System R*. In W. Kim, D. Reiner, D. Batory (eds.): Query Processing in Database Systems, Springer-Verlag, 1985.
L. Mackert and G. Lohman: R* Optimizer Validation and Performance Evaluation for Distributed Queries. In M. Stonebraker (ed.): Readings in Database Systems, Morgan-Kaufmann, CA, 1988
F. Ozcan, S. Nural, P. Koksal, C. Evrendilek, and A. Dogac: Dynamic Query Optimization in Multidatabases. IEEE Data Engineering Bulletin, 20(3), 38–45, 1997.
T. Risch and V. Josifovski: Distributed Data Integration by Object-Oriented Mediator Servers. To appear in Concurrency-Practice and Experience J., John Wiley & Sons, 2001.
M. Roth, F. Ozcan and L. Haas: Cost Models DO MAtter: Providing Cost Information for Diverse Data Sources in Fededrated System. 25th Intl. Conf. on Very Large Databases (VLDB99), 599–610, 1999.
A. Tomasic, L. Raschid and P. Valduriez: Scaling Access to Heterogeneous Data Sources with DISCO. IEEE Transactions in Knowledge and Data Engineering, 10(5), 808–823, 1998
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Josifovski, V., Katchaounov, T., Risch, T. (2001). Evaluation of Join Strategies for Distributed Mediation. In: Caplinskas, A., Eder, J. (eds) Advances in Databases and Information Systems. ADBIS 2001. Lecture Notes in Computer Science, vol 2151. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44803-9_24
Download citation
DOI: https://doi.org/10.1007/3-540-44803-9_24
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42555-7
Online ISBN: 978-3-540-44803-7
eBook Packages: Springer Book Archive