Abstract
We are witnessing a dramatic increase in the use of data-centric distributed systems such as global grid infrastructures, sensor networks, network monitoring, and various publish-subscribe systems. The visions of massive demand-driven data dissemination, intensive processing, and intelligent fusion in order to build dynamic knowledge bases that seemed infeasible just a few years ago are about to come true. However, the realization of this potential demands adequate support from middleware that could be used to deploy and support such systems.
We propose a peer-to-peer based distributed indexing architecture that supports scalable handling of intense dynamic information flows. The suggested indexing scheme is geared towards providing timely responses to queries of different types with varying precision requirements while minimizing the use of network and computational resources. Our solution bestows the capabilities provided by peer-to-peer architectures, such as scalability and load balancing of communication as well as adaptivity in presence of dynamic changes. The paper elaborates on database and peer-to-peer methodologies used in the integrated solution as well as non-trivial interaction between them, thereby providing a valuable feedback to the designers of these techniques.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Arasu, A., Babcock, B., Babu, S., McAlister, J., Widom, J.: Characterizing memory requirements for queries over continuous data streams. In: PODS, pp. 221–232 (2002)
Agrawal, R., Faloutsos, C., Swami, A.: Efficient similarity search in sequence databases. In: FODO, pp. 69–84 (1993)
Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. In: STOC, pp. 20–29 (1996)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: PODS, pp. 1–16 (2002)
Banerjee, S., Bhattacharjee, B., Kommareddy, C.: Scalable Application Layer Multicast. In: SIGCOMM (2002)
Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R*-tree: An efficient and robust access method for points and rectangles. In: SIGMOD, pp. 322–331 (1990)
Berchtold, S., Keim, D.A., Kriegel, H.-P.: The X-tree: An index structure for high-dimensional data. In: VLDB, pp. 28–39 (1996)
Bulut, A., Singh, A.K.: SWAT: Hierarchical stream summarization in large networks. In: ICDE, pp. 303–314 (2003)
Chen, J., DeWitt, D.J., Tian, F., Wang, Y.: NiagaraCQ: a scalable continuous query system for Internet databases. In: SIGMOD, pp. 379–390 (2000)
Cormode, G., Datar, M., Indyk, P., Muthukrishnan, S.: Comparing data streams using hamming norms (how to zero in). In: VLDB, pp. 335–345 (2002)
Cortes, C., Fisher, K., Pregibon, D., Rogers, A.: Hancock: A language for extracting signatures from data streams. In: KDD, pp. 9–17 (2000)
Datar, M., Gionis, A., Indyk, P., Motwani, R.: Maintaining stream statistics over sliding windows. In: ACM SODA, pp. 635–644 (2002)
Deshpande, A., Nath, S., Gibbons, P.B., Seshan, S.: Cache-and-query for wide area sensor databases. In: SIGMOD, pp. 503–514 (2003)
Dobra, A., Garofalakis, M., Gehrke, J., Rastogi, R.: Processing complex aggregate queries over data streams. In: ACM-SIGMOD, pp. 61–72 (2002)
Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: ACM-SIGMOD, pp. 419–429 (1994)
Gehrke, J., Korn, F., Srivastava, D.: On computing correlated aggregates over continual data streams. In: ACM-SIGMOD, pp. 13–24 (2001)
Gilbert, C., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. The VLDB Journal, 79–88 (2001)
Greenstein, B., Estrin, D., Govindan, R., Ratnasamy, S., Shenker, S.: DIFS: A Distributed Index for Features in Sensor Networks. IEEE SNPA (May 2003)
Guha, S., Koudas, N.: Approximating a data stream for querying and estimation: Algorithms and performance evaluation. In: ICDE, pp. 567–576 (2002)
Gupta, I., van Renesse, R., Birman, K.P.: Scalable Fault-tolerant Aggregation in Large Process Groups. In: ICDSN (July 2001)
S&P500 historical stock exchange data, http://kumo.swcp.com/stocks/
CMU host load data, http://www.cs.nwu.edu/~pdinda/LoadTraces/
Huang, Y., Sloan, R.H., Wolfson, O.: Divergence caching in client server architectures. In: PDIS, pp. 131–139 (1994)
Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K.C., Suel, T.: Optimal histograms with quality guarantees. In: VLDB, pp. 275–286 (1998)
Kahveci, T., Singh, A.K.: Variable length queries for time series data. In: ICDE, pp. 273–282 (2001)
Kang, J., Naughton, J.F., Viglas, S.D.: Evaluating window joins over unbounded streams. In: ICDE, pp. 341–352 (2003)
Keogh, E., Chakrabarti, K., Mehrotra, S., Pazzani, M.: Locally adaptive dimensionality reduction for indexing large time series databases. ACM SIGMOD, 151–162 (2001)
Madden, S., Franklin, M.J.: Fjording the stream: An architecture for queries over streaming sensor data. In: ICDE, pp. 555–566 (2002)
Mallat, S.: A Wavelet Tour of Signal Processing, 2nd edn. Academic Press, London (1999)
O’Callaghan, L., Mishra, N., Meyerson, A., Guha, S., Motwani, R.: Highperformance clustering of streams and large data sets. In: ICDE, pp. 685–694 (2002)
Olston, C., Widom, J., Loo, B.T.: Adaptive precision setting for cached approximate values. ACM-SIGMOD, 355–366 (2001)
Ratnasamy, S., Francis, P., Handley, M., Karp, R.: A Scalable Content- Addressable Network. In: SIGCOMM (August 2001)
Ratnasamy, S., Karp, B., Yin, L., Yu, F., Estrin, D., Govindan, R., Shenker, S.: GHT: A Geographic Hash Table for Data-Centric Storage in SensorNets. In: ACM WSNA (September 2002)
Rowstron, A., Druschel, P.: Pastry: Scalable, Distributed Object Location and Routing for Large-Scale Peer-to-Peer Systems. In: IFIP/ACM Middleware (November 2001)
Stoica, R., Morris, D., Karger, M., Kaashoek, M., Balakrishnan, H.: Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications. ACM SIGCOMM (August 2001)
Wolfson, O., Jajodia, S., Huang, Y.: An adaptive data replication algorithm. ACM Transactions on Database Systems 22(2), 255–314 (1997)
Wu, Y., Agrawal, D., Abbadi, A.E.: A comparison of DFT and DWT based similarity search in time-series databases. In: CIKM, pp. 488–495 (2000)
Zhao, B., Kubiatowicz, J., Joseph, A.: Tapestry: An Infrastructure for Fault- Resilient Wide-Area Location and Routing. Technical Report UCB/CSD-01-1141, U. C. Berkeley (2001)
Zhu, Y., Shasha, D.: Statstream: Statistical monitoring of thousands of data streams in real time. In: VLDB, pp. 358–369 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bulut, A., Vitenberg, R., Emekçi, F., Singh, A.K. (2004). An Adaptive and Scalable Middleware for Distributed Indexing of Data Streams. In: Aberer, K., Koubarakis, M., Kalogeraki, V. (eds) Databases, Information Systems, and Peer-to-Peer Computing. DBISP2P 2003. Lecture Notes in Computer Science, vol 2944. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24629-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-24629-9_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20968-3
Online ISBN: 978-3-540-24629-9
eBook Packages: Springer Book Archive