Abstract
Online monitoring of data streams poses a challenge in many data-centric applications including network traffic management, trend analysis, web-click streams, intrusion detection, and sensor networks. Indexing techniques used in these applications have to be time and space efficient while providing a high quality of answers to user queries: (1) queries that monitor aggregates, such as finding surprising levels (“volatility” of a data stream), and detecting bursts, and (2) queries that monitor trends, such as detecting correlations and finding similar patterns. Data stream indexing becomes an even more challenging task, when we take into account the dynamic nature of underlying raw data. For example, bursts of events can occur at variable temporal modalities from hours to days to weeks. We focus on a multi-resolution indexing architecture. The architecture enables the discovery of “interesting” behavior online, provides flexibility in user query definitions, and interconnects registered queries for real-time and in-depth analysis.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
R. Agrawal, C. Faloutsos, and A. Swami. Efficient similarity search in sequence databases. In FODO, pages 69–84, 1993.
A. Akella, A. Bharambe, M. Reiter, and S. Seshan. Detecting DDoS attacks on ISP networks. In MPDS, 2003.
A. Arasu and J. Widom. Resource sharing in continuous sliding-window aggregates. In VLDB, pages 336–347, 2004.
S. Banerjee, B. Bhattacharjee, and C. Kommareddy. Scalable Application Layer Multicast. In SIGCOMM, pages 205–217, 2002.
N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger. The R*-tree: An efficient and robust access method for points and rectangles. In SIGMOD, pages 322–331, 1990.
J. Bentley, B. Weide, and A. Yao. Optimal expected time algorithms for closest point problems. In ACM Trans. on Math. Software, volume 6, pages 563–580, 1980.
A. Bulut and A. Singh. SWAT: Hierarchical stream summarization in large networks. In ICDE, pages 303–314, 2003.
A. Bulut and A. Singh. A unified framework for monitoring data streams in real time. In ICDE, pages 44–55, 2005.
D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, and S. Zdonik. Monitoring streams-a new class of data management applications. In VLDB, pages 215–226, 2002.
Y. Chen, G. Dong, J. Han, J. Pei, B. Wah, and J. Wang. Online analytical processing stream data: Is it feasible? In DMKD, 2002.
A. Deshpande, C. Guestrin, S. Madden, J. Hellerstein, and W. Hong. Model-driven data acquisition in sensor networks. In VLDB, pages 588–599, 2004.
P. Dinda. CMU, Aug 97 Load Trace. In Host Load Data Archive http://www.cs.northwestern.edu/~pdinda/LoadTraces/.
C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching in time-series databases. In SIGMOD, pages 419–429, 1994.
C. Guestrin, P. Bodi, R. Thibau, M. Paski, and S. Madden. Distributed regression: an efficient framework for modeling sensor network data. In IPSN, pages 1–10, 2004.
A. Guttman. R-trees: A dynamic index structure for spatial searching. In SIGMOD, pages 47–57, 1984.
T. Kahveci and A. Singh. Variable length queries for time series data. In ICDE, pages 273–282, 2001.
E. Keogh, K. Chakrabarti, S. Mehrotra, and M. Pazzani. Locally adaptive dimensionality reduction for indexing large time series databases. In SIGMOD, pages 151–162, 2001.
E. Keogh and T. Folias. Time Series Data Mining Archive. In http://www.cs.ucr.edu/~eamonn/TSDMA, 2002.
Y. Law, H. Wang, and C. Zaniolo. Query languages and data models for database sequences and data streams. In VLDB, pages 492–503, 2004.
M. Lee, W. Hsu, C. Jensen, B. Cui, and K. Teo. Supporting frequent updates in R-Trees: A bottom-up approach. In VLDB, pages 608–619, 2003.
S. Mallat. A Wavelet Tour of Signal Processing. Academic Press, 2 edition, 1999.
Y. Moon, K. Whang, and W. Han. General match: a subsequence matching method in time-series databases based on generalized windows. In SIGMOD, pages 382–393, 2002.
T. Palpanas, M. Vlachos, E. Keogh, D. Gunopulos, and W. Truppel. Online amnesic approximation of streaming time series. In ICDE, pages 338–349, 2004.
S. Papadimitriou, A. Brockwell, and C. Faloutsos. AWSOM: Adaptive, hands-off stream mining. In VLDB, pages 560–571, 2003.
N. Roussopoulos, S. Kelley, and F. Vincent. Nearest neighbor queries. pages 71–79, 1995.
H. Wu, B. Salzberg, and D. Zhang. Online event-driven subsequence matching over financial data streams. In SIGMOD, pages 23–34, 2004.
B. Wyman and D. Werner. Content-based Publish-Subscribe over APEX. In Internet-Draft, April, 2002.
B. Yi, N. Sidiropoulos, T. Johnson, H. Jagadish, C. Faloutsos, and A. Biliris. Online data mining for co-evolving time sequences. In ICDE, 2000.
P. Young. Recursive Estimation and Time-Series Analysis: An Introduction. Springer-Verlag, 1984.
Y. Zhu and D. Shasha. Statstream: Statistical monitoring of thousands of data streams in real time. In VLDB, pages 358–369, 2002.
Y. Zhu and D. Shasha. Efficient elastic burst detection in data streams. In SIGKDD, pages 336–345, 2003.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Bulut, A., Singh, A.K. (2007). Indexing and Querying Data Streams. In: Aggarwal, C.C. (eds) Data Streams. Advances in Database Systems, vol 31. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-47534-9_11
Download citation
DOI: https://doi.org/10.1007/978-0-387-47534-9_11
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-28759-1
Online ISBN: 978-0-387-47534-9
eBook Packages: Computer ScienceComputer Science (R0)