Skip to main content
Book cover

Data Streams pp 309–331Cite as

Algorithms for Distributed Data Stream Mining

  • Chapter

Part of the book series: Advances in Database Systems ((ADBS,volume 31))

Abstract

The field of Distributed Data Mining (DDM) deals with the problem of analyzing data by paying careful attention to the distributed computing, storage, communication, and human-factor related resources. Unlike the traditional centralized systems, DDM offers a fundamentally distributed solution to analyze data without necessarily demanding collection of the data to a single central site. This chapter presents an introduction to distributed data mining for continuous streams. It focuses on the situations where the data observed at different locations change with time. The chapter provides an exposure to the literature and illustrates the behavior of this class of algorithms by exploring two very different types of techniques—one for the peer-to-peer and another for the hierarchical distributed environment. The chapter also briefly discusses several different applications of these algorithms.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. C Aggarwal. A framework for diagnosing changes in evolving data streams. In ACM SIGMOD’ 03 International Conference on Management of Data, 2003.

    Google Scholar 

  2. C. Aggarwal, J. Han, J. Wang, and P. Yu. A framework for clustering evolving data streams. In VLDB conference, 2003.

    Google Scholar 

  3. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. On demand classification of data streams. In KDD, 2004.

    Google Scholar 

  4. B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In In Principles of Database Systems (PODS’02), 2002.

    Google Scholar 

  5. B. Babcock and C. Olston. Distributed top-k monitoring. In ACM SIGMOD’ 03 International Conference on Management of Data, 2003.

    Google Scholar 

  6. S. Ben-David, J. Gehrke, and D. Kifer. Detecting change in data streams. In VLDB Conference, 2004.

    Google Scholar 

  7. J. Chen, D. DeWitt, F. Tian, and Y. Wang. NiagaraCQ: a scalable continuous query system for Internet databases. In ACM SIGMOD’00 International Conference on Management of Data, 2000.

    Google Scholar 

  8. R. Chen, K. Sivakumar, and H. Kargupta. An approach to online bayesian learning from multiple data streams. In Proceedings of the Workshop on Ubiquitous Data Mining (5th European Conference on Principles and Practice of Knowledge Discovery in Databases), Freiburg, Germany, September 2001.

    Google Scholar 

  9. R. Chen, K. Sivakumar, and H. Kargupta. Collective mining of bayesian networks from distributed heterogeneous data. Knowledge and Information Systems, 6:164–187, 2004.

    Article  Google Scholar 

  10. P. Gibbons and S. Tirthapura. Estimating simple functions on the union of data streams. In ACM Symposium on Parallel Algorithms and Architectures, 2001.

    Google Scholar 

  11. S. Guha, N. Mishra, R. Motwani, and L. O’Callaghan. Clustering data streams. In IEEE Symposium on FOCS, 2000.

    Google Scholar 

  12. D. Heckerman. A tutorial on learning with Bayesian networks. Technical Report MSR-TR-95-06, Microsoft Research, 1995.

    Google Scholar 

  13. M. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. Technical Report TR-1998-011, Compaq System Research Center, 1998.

    Google Scholar 

  14. G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In SIGKDD, 2001.

    Google Scholar 

  15. R. Jin and G. Agrawal. Efficient decision tree construction on streaming data. In SIGKDD, 2003.

    Google Scholar 

  16. H. Kargupta and K. Sivakumar. Existential Pleasures of Distributed Data Mining. Data Mining: Next Generation Challenges and Future Directions. AAAI/MIT press, 2004.

    Google Scholar 

  17. J. Kotecha, V. Ramachandran, and A. Sayeed. Distributed multi-target classification in wireless sensor networks. IEEE Journal of Selected Areas in Communications (Special Issue on Self-Organizing Distributed Collaborative Sensor Networks), 2003.

    Google Scholar 

  18. D. Krivitski, A. Schuster, and R. Wolff. A local facility location algorithm for sensor networks. In Proc. of DCOSS’05, 2005.

    Google Scholar 

  19. S. Kutten and D. Peleg. Fault-local distributed mending. In Proc. of the ACM Symposium on Principle of Distributed Computing (PODC), pages 20–27, Ottawa, Canada, August 1995.

    Google Scholar 

  20. S. L. Lauritzen and D. J. Spiegelhalter. Local computations with probabilities on graphical structures and their application to expert systems (with discussion). Journal of the Royal Statistical Society, series B, 50:157–224, 1988.

    MATH  MathSciNet  Google Scholar 

  21. N. Linial. Locality in distributed graph algorithms. SIAM Journal of Computing, 21:193–201, 1992.

    Article  MATH  MathSciNet  Google Scholar 

  22. A. Manjhi, V. Shkapenyuk, K. Dhamdhere, and C. Olston. Finding (recently) frequent items in distributed data streams. In International Conference on Data Engineering (ICDE’05), 2005.

    Google Scholar 

  23. C. Olston, J. Jiang, and J. Widom. Adaptive filters for continuous queries over distributed data streams. In ACM SIGMOD’ 03 International Conference on Management of Data, 2003.

    Google Scholar 

  24. J. Widom and R. Motwani. Query processing, resource management, and approximation in a data stream management system. In CIDR, 2003.

    Google Scholar 

  25. R. Wolff, K. Bhaduri, and H. Kargupta. Local L2 thresholding based data mining in peer-to-peer systems. In Proceedings of SIAM International Conference in Data Mining (SDM), Bethesda, Maryland, 2006.

    Google Scholar 

  26. R. Wolff and A. Schuster. Association rule mining in peer-to-peer systems. In Proceedings of ICDM’03, Melbourne, Florida, 2003.

    Google Scholar 

  27. J. Zhao, R. Govindan, and D. Estrin. Computing aggregates for monitoring wireless sensor networks. In Proceedings of the First IEEE International Workshop on Sensor Network Protocols and Applications, 2003.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Bhaduri, K., Das, K., Sivakumar, K., Kargupta, H., Wolff, R., Chen, R. (2007). Algorithms for Distributed Data Stream Mining. In: Aggarwal, C.C. (eds) Data Streams. Advances in Database Systems, vol 31. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-47534-9_14

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-47534-9_14

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-28759-1

  • Online ISBN: 978-0-387-47534-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics