Abstract
Many processes in telecommunications (e.g., network monitoring) generate very large amounts (many terabytes) of data. This data is stored in a data warehouse and used for data mining and analysis. Many analyses require the join of several very large data sets. Conventional methods for performing these joins are prohibitively expensive. However, one can often exploit the temporal nature of the data and the join conditions to obtain fast algorithms that operate entirely in memory. In this paper, we describe such a join algorithm (the window join) together with a method for analyzing queries to determine when and how the window join should be applied. The window join makes sequential scans over the input data, allowing the use of tape storage. We have used the techniques described in this paper on a large IP data warehouse.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Carey, M., Haas, L., Livny, M.: Tapes hold data too; Challenges of tuples on tertiary storage. In: Proc. ACM SIGMOD, pp. 413–418 (1993)
Chatziantoniou, D.: Ad-Hoc OLAP: Expression and Evaluation. Submitted for publication (Int. Conf. Data Engineering, 1999) (August 1998)
Chatziantoniou, D.: Evaluation of Ad Hoc OLAP: In-Place Computation. In: ACM/IEEE International Conference on Scientific and Statistical Database Management (1999) (to appear)
Chatziantoniou, D., Johnson, T.: Decision Support Queries on a Tape-Resident Data Warehouse. IEEE Computer (to appear)
Chatziantoniou, D., Johnson, T., Kim, S.: On Modeling and Processing Decision Support Queries. Submitted for publication (1999)
Gray, J., Graefe, G.: The five-minute rule ten years later, and other computer storage rules of thumb. SIGMOD Record 26(4), 63–68 (1997)
Helmer, S., Westmann, T., Moerkotte, G.: Diag-join: An opportunistic join algorithm for 1:N relationships. In: Proc. of the 24th VLDB Conf., pp. 98–109 (1998)
Johnson, T., Chatziantoniou, D.: Extending Complex Ad Hoc OLAP. Submitted for publication (February 1999)
Johnson, T., Miller, E.: Performance Measurements of Teriary Storage Devices. In: 24th VLDB Conference, pp. 50–61 (1998)
Myllymaki, J., Livny, M.: Relational joins for data on tertiary storage. In: Proc Intl. Conf. on Data Engineering (1997)
Sarawagi, M.S.S.: Reordering query execution in tertiary memory databases. In: Proc. 22st Very Large Database Conference (1996)
Seshadri, P., Livny, M., Raghu, R.: The design and implementation of a sequence database system. In: Proceedings of the 22nd VLDB Conference (1996)
Sullivan, M., Heybey, A.: Tribeca: A system for managing large databases of network traffic. Technical report. Bellcore (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Johnson, T., Chatziantoniou, D. (2000). Joining Very Large Data Sets. In: Jonker, W. (eds) Databases in Telecommunications. DBTel 1999. Lecture Notes in Computer Science, vol 1819. Springer, Berlin, Heidelberg. https://doi.org/10.1007/10721056_9
Download citation
DOI: https://doi.org/10.1007/10721056_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67667-6
Online ISBN: 978-3-540-45100-6
eBook Packages: Springer Book Archive