Skip to main content

An Adaptive and Scalable Middleware for Distributed Indexing of Data Streams

  • Conference paper
  • 296 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2944))

Abstract

We are witnessing a dramatic increase in the use of data-centric distributed systems such as global grid infrastructures, sensor networks, network monitoring, and various publish-subscribe systems. The visions of massive demand-driven data dissemination, intensive processing, and intelligent fusion in order to build dynamic knowledge bases that seemed infeasible just a few years ago are about to come true. However, the realization of this potential demands adequate support from middleware that could be used to deploy and support such systems.

We propose a peer-to-peer based distributed indexing architecture that supports scalable handling of intense dynamic information flows. The suggested indexing scheme is geared towards providing timely responses to queries of different types with varying precision requirements while minimizing the use of network and computational resources. Our solution bestows the capabilities provided by peer-to-peer architectures, such as scalability and load balancing of communication as well as adaptivity in presence of dynamic changes. The paper elaborates on database and peer-to-peer methodologies used in the integrated solution as well as non-trivial interaction between them, thereby providing a valuable feedback to the designers of these techniques.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arasu, A., Babcock, B., Babu, S., McAlister, J., Widom, J.: Characterizing memory requirements for queries over continuous data streams. In: PODS, pp. 221–232 (2002)

    Google Scholar 

  2. Agrawal, R., Faloutsos, C., Swami, A.: Efficient similarity search in sequence databases. In: FODO, pp. 69–84 (1993)

    Google Scholar 

  3. Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. In: STOC, pp. 20–29 (1996)

    Google Scholar 

  4. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: PODS, pp. 1–16 (2002)

    Google Scholar 

  5. Banerjee, S., Bhattacharjee, B., Kommareddy, C.: Scalable Application Layer Multicast. In: SIGCOMM (2002)

    Google Scholar 

  6. Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R*-tree: An efficient and robust access method for points and rectangles. In: SIGMOD, pp. 322–331 (1990)

    Google Scholar 

  7. Berchtold, S., Keim, D.A., Kriegel, H.-P.: The X-tree: An index structure for high-dimensional data. In: VLDB, pp. 28–39 (1996)

    Google Scholar 

  8. Bulut, A., Singh, A.K.: SWAT: Hierarchical stream summarization in large networks. In: ICDE, pp. 303–314 (2003)

    Google Scholar 

  9. Chen, J., DeWitt, D.J., Tian, F., Wang, Y.: NiagaraCQ: a scalable continuous query system for Internet databases. In: SIGMOD, pp. 379–390 (2000)

    Google Scholar 

  10. Cormode, G., Datar, M., Indyk, P., Muthukrishnan, S.: Comparing data streams using hamming norms (how to zero in). In: VLDB, pp. 335–345 (2002)

    Google Scholar 

  11. Cortes, C., Fisher, K., Pregibon, D., Rogers, A.: Hancock: A language for extracting signatures from data streams. In: KDD, pp. 9–17 (2000)

    Google Scholar 

  12. Datar, M., Gionis, A., Indyk, P., Motwani, R.: Maintaining stream statistics over sliding windows. In: ACM SODA, pp. 635–644 (2002)

    Google Scholar 

  13. Deshpande, A., Nath, S., Gibbons, P.B., Seshan, S.: Cache-and-query for wide area sensor databases. In: SIGMOD, pp. 503–514 (2003)

    Google Scholar 

  14. Dobra, A., Garofalakis, M., Gehrke, J., Rastogi, R.: Processing complex aggregate queries over data streams. In: ACM-SIGMOD, pp. 61–72 (2002)

    Google Scholar 

  15. Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: ACM-SIGMOD, pp. 419–429 (1994)

    Google Scholar 

  16. Gehrke, J., Korn, F., Srivastava, D.: On computing correlated aggregates over continual data streams. In: ACM-SIGMOD, pp. 13–24 (2001)

    Google Scholar 

  17. Gilbert, C., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. The VLDB Journal, 79–88 (2001)

    Google Scholar 

  18. Greenstein, B., Estrin, D., Govindan, R., Ratnasamy, S., Shenker, S.: DIFS: A Distributed Index for Features in Sensor Networks. IEEE SNPA (May 2003)

    Google Scholar 

  19. Guha, S., Koudas, N.: Approximating a data stream for querying and estimation: Algorithms and performance evaluation. In: ICDE, pp. 567–576 (2002)

    Google Scholar 

  20. Gupta, I., van Renesse, R., Birman, K.P.: Scalable Fault-tolerant Aggregation in Large Process Groups. In: ICDSN (July 2001)

    Google Scholar 

  21. S&P500 historical stock exchange data, http://kumo.swcp.com/stocks/

  22. CMU host load data, http://www.cs.nwu.edu/~pdinda/LoadTraces/

  23. Huang, Y., Sloan, R.H., Wolfson, O.: Divergence caching in client server architectures. In: PDIS, pp. 131–139 (1994)

    Google Scholar 

  24. Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K.C., Suel, T.: Optimal histograms with quality guarantees. In: VLDB, pp. 275–286 (1998)

    Google Scholar 

  25. Kahveci, T., Singh, A.K.: Variable length queries for time series data. In: ICDE, pp. 273–282 (2001)

    Google Scholar 

  26. Kang, J., Naughton, J.F., Viglas, S.D.: Evaluating window joins over unbounded streams. In: ICDE, pp. 341–352 (2003)

    Google Scholar 

  27. Keogh, E., Chakrabarti, K., Mehrotra, S., Pazzani, M.: Locally adaptive dimensionality reduction for indexing large time series databases. ACM SIGMOD, 151–162 (2001)

    Google Scholar 

  28. Madden, S., Franklin, M.J.: Fjording the stream: An architecture for queries over streaming sensor data. In: ICDE, pp. 555–566 (2002)

    Google Scholar 

  29. Mallat, S.: A Wavelet Tour of Signal Processing, 2nd edn. Academic Press, London (1999)

    MATH  Google Scholar 

  30. O’Callaghan, L., Mishra, N., Meyerson, A., Guha, S., Motwani, R.: Highperformance clustering of streams and large data sets. In: ICDE, pp. 685–694 (2002)

    Google Scholar 

  31. Olston, C., Widom, J., Loo, B.T.: Adaptive precision setting for cached approximate values. ACM-SIGMOD, 355–366 (2001)

    Google Scholar 

  32. Ratnasamy, S., Francis, P., Handley, M., Karp, R.: A Scalable Content- Addressable Network. In: SIGCOMM (August 2001)

    Google Scholar 

  33. Ratnasamy, S., Karp, B., Yin, L., Yu, F., Estrin, D., Govindan, R., Shenker, S.: GHT: A Geographic Hash Table for Data-Centric Storage in SensorNets. In: ACM WSNA (September 2002)

    Google Scholar 

  34. Rowstron, A., Druschel, P.: Pastry: Scalable, Distributed Object Location and Routing for Large-Scale Peer-to-Peer Systems. In: IFIP/ACM Middleware (November 2001)

    Google Scholar 

  35. Stoica, R., Morris, D., Karger, M., Kaashoek, M., Balakrishnan, H.: Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications. ACM SIGCOMM (August 2001)

    Google Scholar 

  36. Wolfson, O., Jajodia, S., Huang, Y.: An adaptive data replication algorithm. ACM Transactions on Database Systems 22(2), 255–314 (1997)

    Article  Google Scholar 

  37. Wu, Y., Agrawal, D., Abbadi, A.E.: A comparison of DFT and DWT based similarity search in time-series databases. In: CIKM, pp. 488–495 (2000)

    Google Scholar 

  38. Zhao, B., Kubiatowicz, J., Joseph, A.: Tapestry: An Infrastructure for Fault- Resilient Wide-Area Location and Routing. Technical Report UCB/CSD-01-1141, U. C. Berkeley (2001)

    Google Scholar 

  39. Zhu, Y., Shasha, D.: Statstream: Statistical monitoring of thousands of data streams in real time. In: VLDB, pp. 358–369 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bulut, A., Vitenberg, R., Emekçi, F., Singh, A.K. (2004). An Adaptive and Scalable Middleware for Distributed Indexing of Data Streams. In: Aberer, K., Koubarakis, M., Kalogeraki, V. (eds) Databases, Information Systems, and Peer-to-Peer Computing. DBISP2P 2003. Lecture Notes in Computer Science, vol 2944. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24629-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24629-9_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20968-3

  • Online ISBN: 978-3-540-24629-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics