Skip to main content

Joining Very Large Data Sets

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1819))

Abstract

Many processes in telecommunications (e.g., network monitoring) generate very large amounts (many terabytes) of data. This data is stored in a data warehouse and used for data mining and analysis. Many analyses require the join of several very large data sets. Conventional methods for performing these joins are prohibitively expensive. However, one can often exploit the temporal nature of the data and the join conditions to obtain fast algorithms that operate entirely in memory. In this paper, we describe such a join algorithm (the window join) together with a method for analyzing queries to determine when and how the window join should be applied. The window join makes sequential scans over the input data, allowing the use of tape storage. We have used the techniques described in this paper on a large IP data warehouse.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Carey, M., Haas, L., Livny, M.: Tapes hold data too; Challenges of tuples on tertiary storage. In: Proc. ACM SIGMOD, pp. 413–418 (1993)

    Google Scholar 

  2. Chatziantoniou, D.: Ad-Hoc OLAP: Expression and Evaluation. Submitted for publication (Int. Conf. Data Engineering, 1999) (August 1998)

    Google Scholar 

  3. Chatziantoniou, D.: Evaluation of Ad Hoc OLAP: In-Place Computation. In: ACM/IEEE International Conference on Scientific and Statistical Database Management (1999) (to appear)

    Google Scholar 

  4. Chatziantoniou, D., Johnson, T.: Decision Support Queries on a Tape-Resident Data Warehouse. IEEE Computer (to appear)

    Google Scholar 

  5. Chatziantoniou, D., Johnson, T., Kim, S.: On Modeling and Processing Decision Support Queries. Submitted for publication (1999)

    Google Scholar 

  6. Gray, J., Graefe, G.: The five-minute rule ten years later, and other computer storage rules of thumb. SIGMOD Record 26(4), 63–68 (1997)

    Article  Google Scholar 

  7. Helmer, S., Westmann, T., Moerkotte, G.: Diag-join: An opportunistic join algorithm for 1:N relationships. In: Proc. of the 24th VLDB Conf., pp. 98–109 (1998)

    Google Scholar 

  8. Johnson, T., Chatziantoniou, D.: Extending Complex Ad Hoc OLAP. Submitted for publication (February 1999)

    Google Scholar 

  9. Johnson, T., Miller, E.: Performance Measurements of Teriary Storage Devices. In: 24th VLDB Conference, pp. 50–61 (1998)

    Google Scholar 

  10. Myllymaki, J., Livny, M.: Relational joins for data on tertiary storage. In: Proc Intl. Conf. on Data Engineering (1997)

    Google Scholar 

  11. Sarawagi, M.S.S.: Reordering query execution in tertiary memory databases. In: Proc. 22st Very Large Database Conference (1996)

    Google Scholar 

  12. Seshadri, P., Livny, M., Raghu, R.: The design and implementation of a sequence database system. In: Proceedings of the 22nd VLDB Conference (1996)

    Google Scholar 

  13. Sullivan, M., Heybey, A.: Tribeca: A system for managing large databases of network traffic. Technical report. Bellcore (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Johnson, T., Chatziantoniou, D. (2000). Joining Very Large Data Sets. In: Jonker, W. (eds) Databases in Telecommunications. DBTel 1999. Lecture Notes in Computer Science, vol 1819. Springer, Berlin, Heidelberg. https://doi.org/10.1007/10721056_9

Download citation

  • DOI: https://doi.org/10.1007/10721056_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67667-6

  • Online ISBN: 978-3-540-45100-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics