ANALYTICAL AND EXPERIMENTAL EVALUATION OF STREAM-BASED JOIN
Continuous queries over data streams have gained popularity as the breadth of possible applications, ranging from network monitoring to online pattern discovery, have increased. Joining of streams is a fundamental issue that must be resolved to enable complex queries over multiple streams. However, as streams can represent potentially infinite data, it is infeasible to have full join evaluations as is the case with traditional databases. Joins in a stream environment are thus evaluated not over entire streams, but on specific windows defined on the streams. In this paper, we present windowed implementations of the traditional nested loops and hash join algorithms. In our work we analytically and experimentally evaluate the performance of these algorithms for different parameters. We find that, in general, a hash join provides better performance. We also investigate invalidation strategies to remove stale data from the window buffers, and propose an optimal strategy that balances processing time versus buffer size.
KeywordsNest Loop Total Processing Time Query Optimizer Streaming Data Query Plan
Unable to display preview. Download preview PDF.
- Arasu, A., Babu, S., and Widom, J. (2002). An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations. Technical report, Stanford University.Google Scholar
- Babcock, B., Babu, S., Datar, M., Motwani, R., and Widom, J. (2002). Models and Issues in Data Stream Systems. In Principles of Database Systems (PODS).Google Scholar
- Babu, S. and Widom, J. (2001). Continuous Queries over Data Streams. In Sigmod Record.Google Scholar
- Carney, D., Cetintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., and Zdonik, S. (2002). Monitoring Streams - A New Class of Data Management Applications. In Int. Conference on Very Large Data Bases, pages 215–226.Google Scholar
- Chen, J., DeWitt, D., Tian, F., and Wang, Y. (2000). NiagaraCQ: A Scalable Continuous Query System for Internet Databases. In SIGMOD, pages 379–390.Google Scholar
- Cranor, C., Gao, Y., Johnson, T., Shkapenyuk, V., and Spatscheck, O. (2002). Gigascope: High Performance Network Monitoring with an SQL Interface. In SIGMOD, page 623.Google Scholar
- Motwani, R., Widom, J., Arasu, A., Babcock, B., Babu, S., Datar, M., Manku, G., Olston, C., Rosenstein, J., and Varma, R. (2003). Query Processing, Resource Management, and Approximation in a Data Stream Management System. In Conference on Innovative Data Systems Research.Google Scholar
- Ullman, J. and Widom, J. (1997). A First Course in Database Systems. Prentice-Hall, Inc.Google Scholar
- Viglas, S. and Naughton, J. (2002). Rate-based Query Optimization for Streaming Information Sources. In SIGMOD, pages 37–48.Google Scholar