An Adaptive and Scalable Middleware for Distributed Indexing of Data Streams

Bulut, Ahmet; Vitenberg, Roman; Emekçi, Fatih; Singh, Ambuj K.

doi:10.1007/978-3-540-24629-9_10

An Adaptive and Scalable Middleware for Distributed Indexing of Data Streams

Ahmet Bulut⁷,
Roman Vitenberg⁷,
Fatih Emekçi⁷ &
…
Ambuj K. Singh⁷

Conference paper

296 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2944))

Abstract

We are witnessing a dramatic increase in the use of data-centric distributed systems such as global grid infrastructures, sensor networks, network monitoring, and various publish-subscribe systems. The visions of massive demand-driven data dissemination, intensive processing, and intelligent fusion in order to build dynamic knowledge bases that seemed infeasible just a few years ago are about to come true. However, the realization of this potential demands adequate support from middleware that could be used to deploy and support such systems.

We propose a peer-to-peer based distributed indexing architecture that supports scalable handling of intense dynamic information flows. The suggested indexing scheme is geared towards providing timely responses to queries of different types with varying precision requirements while minimizing the use of network and computational resources. Our solution bestows the capabilities provided by peer-to-peer architectures, such as scalability and load balancing of communication as well as adaptivity in presence of dynamic changes. The paper elaborates on database and peer-to-peer methodologies used in the integrated solution as well as non-trivial interaction between them, thereby providing a valuable feedback to the designers of these techniques.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arasu, A., Babcock, B., Babu, S., McAlister, J., Widom, J.: Characterizing memory requirements for queries over continuous data streams. In: PODS, pp. 221–232 (2002)
Google Scholar
Agrawal, R., Faloutsos, C., Swami, A.: Efficient similarity search in sequence databases. In: FODO, pp. 69–84 (1993)
Google Scholar
Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. In: STOC, pp. 20–29 (1996)
Google Scholar
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: PODS, pp. 1–16 (2002)
Google Scholar
Banerjee, S., Bhattacharjee, B., Kommareddy, C.: Scalable Application Layer Multicast. In: SIGCOMM (2002)
Google Scholar
Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R*-tree: An efficient and robust access method for points and rectangles. In: SIGMOD, pp. 322–331 (1990)
Google Scholar
Berchtold, S., Keim, D.A., Kriegel, H.-P.: The X-tree: An index structure for high-dimensional data. In: VLDB, pp. 28–39 (1996)
Google Scholar
Bulut, A., Singh, A.K.: SWAT: Hierarchical stream summarization in large networks. In: ICDE, pp. 303–314 (2003)
Google Scholar
Chen, J., DeWitt, D.J., Tian, F., Wang, Y.: NiagaraCQ: a scalable continuous query system for Internet databases. In: SIGMOD, pp. 379–390 (2000)
Google Scholar
Cormode, G., Datar, M., Indyk, P., Muthukrishnan, S.: Comparing data streams using hamming norms (how to zero in). In: VLDB, pp. 335–345 (2002)
Google Scholar
Cortes, C., Fisher, K., Pregibon, D., Rogers, A.: Hancock: A language for extracting signatures from data streams. In: KDD, pp. 9–17 (2000)
Google Scholar
Datar, M., Gionis, A., Indyk, P., Motwani, R.: Maintaining stream statistics over sliding windows. In: ACM SODA, pp. 635–644 (2002)
Google Scholar
Deshpande, A., Nath, S., Gibbons, P.B., Seshan, S.: Cache-and-query for wide area sensor databases. In: SIGMOD, pp. 503–514 (2003)
Google Scholar
Dobra, A., Garofalakis, M., Gehrke, J., Rastogi, R.: Processing complex aggregate queries over data streams. In: ACM-SIGMOD, pp. 61–72 (2002)
Google Scholar
Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: ACM-SIGMOD, pp. 419–429 (1994)
Google Scholar
Gehrke, J., Korn, F., Srivastava, D.: On computing correlated aggregates over continual data streams. In: ACM-SIGMOD, pp. 13–24 (2001)
Google Scholar
Gilbert, C., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. The VLDB Journal, 79–88 (2001)
Google Scholar
Greenstein, B., Estrin, D., Govindan, R., Ratnasamy, S., Shenker, S.: DIFS: A Distributed Index for Features in Sensor Networks. IEEE SNPA (May 2003)
Google Scholar
Guha, S., Koudas, N.: Approximating a data stream for querying and estimation: Algorithms and performance evaluation. In: ICDE, pp. 567–576 (2002)
Google Scholar
Gupta, I., van Renesse, R., Birman, K.P.: Scalable Fault-tolerant Aggregation in Large Process Groups. In: ICDSN (July 2001)
Google Scholar
S&P500 historical stock exchange data, http://kumo.swcp.com/stocks/
CMU host load data, http://www.cs.nwu.edu/~pdinda/LoadTraces/
Huang, Y., Sloan, R.H., Wolfson, O.: Divergence caching in client server architectures. In: PDIS, pp. 131–139 (1994)
Google Scholar
Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K.C., Suel, T.: Optimal histograms with quality guarantees. In: VLDB, pp. 275–286 (1998)
Google Scholar
Kahveci, T., Singh, A.K.: Variable length queries for time series data. In: ICDE, pp. 273–282 (2001)
Google Scholar
Kang, J., Naughton, J.F., Viglas, S.D.: Evaluating window joins over unbounded streams. In: ICDE, pp. 341–352 (2003)
Google Scholar
Keogh, E., Chakrabarti, K., Mehrotra, S., Pazzani, M.: Locally adaptive dimensionality reduction for indexing large time series databases. ACM SIGMOD, 151–162 (2001)
Google Scholar
Madden, S., Franklin, M.J.: Fjording the stream: An architecture for queries over streaming sensor data. In: ICDE, pp. 555–566 (2002)
Google Scholar
Mallat, S.: A Wavelet Tour of Signal Processing, 2nd edn. Academic Press, London (1999)
MATH Google Scholar
O’Callaghan, L., Mishra, N., Meyerson, A., Guha, S., Motwani, R.: Highperformance clustering of streams and large data sets. In: ICDE, pp. 685–694 (2002)
Google Scholar
Olston, C., Widom, J., Loo, B.T.: Adaptive precision setting for cached approximate values. ACM-SIGMOD, 355–366 (2001)
Google Scholar
Ratnasamy, S., Francis, P., Handley, M., Karp, R.: A Scalable Content- Addressable Network. In: SIGCOMM (August 2001)
Google Scholar
Ratnasamy, S., Karp, B., Yin, L., Yu, F., Estrin, D., Govindan, R., Shenker, S.: GHT: A Geographic Hash Table for Data-Centric Storage in SensorNets. In: ACM WSNA (September 2002)
Google Scholar
Rowstron, A., Druschel, P.: Pastry: Scalable, Distributed Object Location and Routing for Large-Scale Peer-to-Peer Systems. In: IFIP/ACM Middleware (November 2001)
Google Scholar
Stoica, R., Morris, D., Karger, M., Kaashoek, M., Balakrishnan, H.: Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications. ACM SIGCOMM (August 2001)
Google Scholar
Wolfson, O., Jajodia, S., Huang, Y.: An adaptive data replication algorithm. ACM Transactions on Database Systems 22(2), 255–314 (1997)
Article Google Scholar
Wu, Y., Agrawal, D., Abbadi, A.E.: A comparison of DFT and DWT based similarity search in time-series databases. In: CIKM, pp. 488–495 (2000)
Google Scholar
Zhao, B., Kubiatowicz, J., Joseph, A.: Tapestry: An Infrastructure for Fault- Resilient Wide-Area Location and Routing. Technical Report UCB/CSD-01-1141, U. C. Berkeley (2001)
Google Scholar
Zhu, Y., Shasha, D.: Statstream: Statistical monitoring of thousands of data streams in real time. In: VLDB, pp. 358–369 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, UCSB, Santa Barbara, CA, 93106, USA
Ahmet Bulut, Roman Vitenberg, Fatih Emekçi & Ambuj K. Singh

Authors

Ahmet Bulut
View author publications
You can also search for this author in PubMed Google Scholar
Roman Vitenberg
View author publications
You can also search for this author in PubMed Google Scholar
Fatih Emekçi
View author publications
You can also search for this author in PubMed Google Scholar
Ambuj K. Singh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer and Communication Sciences, École Polytechnique Fédérale de Lausanne (EPFL), CH-1015, Lausanne, Switzerland
Karl Aberer
National and Kapodistrian University of Athens, Greece
Manolis Koubarakis
Department of Computer Science and Engineering, University of California, Riverside, 92521, Riverside, CA
Vana Kalogeraki

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bulut, A., Vitenberg, R., Emekçi, F., Singh, A.K. (2004). An Adaptive and Scalable Middleware for Distributed Indexing of Data Streams. In: Aberer, K., Koubarakis, M., Kalogeraki, V. (eds) Databases, Information Systems, and Peer-to-Peer Computing. DBISP2P 2003. Lecture Notes in Computer Science, vol 2944. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24629-9_10

Download citation

DOI: https://doi.org/10.1007/978-3-540-24629-9_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20968-3
Online ISBN: 978-3-540-24629-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics