© 2016

Data Stream Management

Processing High-Speed Data Streams

  • Minos Garofalakis
  • Johannes Gehrke
  • Rajeev Rastogi


  • Comprehensive introduction to the algorithmic and theoretical foundations of data stream processing – from basic mathematical models, algorithms, and analytics, and progressing to more advanced streaming algorithms and techniques

  • Provides a thorough discussion on system and language aspects of data stream processing, through surveys of influential system prototypes and language designs

  • Discusses representative applications of data stream processing techniques in different domains, including network management, financial analytics, publish/subscribe engines, and time-series analysis

  • Includes an overview of current data streaming products and new streaming application domains, such as cloud computing and complex event processing


Part of the Data-Centric Systems and Applications book series (DCSA)

Table of contents

  1. Front Matter
    Pages I-VII
  2. Minos Garofalakis, Johannes Gehrke, Rajeev Rastogi
    Pages 1-9
  3. Foundations and Basic Stream Synopses

    1. Front Matter
      Pages 11-11
    2. Michael B. Greenwald, Sanjeev Khanna
      Pages 45-86
    3. Graham Cormode, Minos Garofalakis
      Pages 87-102
    4. Phillip B. Gibbons
      Pages 121-147
    5. Mayur Datar, Rajeev Motwani
      Pages 149-165
  4. Mining Data Streams

    1. Front Matter
      Pages 167-167
    2. Sudipto Guha, Nina Mishra
      Pages 169-187
    3. Geoff Hulten, Pedro Domingos
      Pages 189-208
    4. Gurmeet Singh Manku
      Pages 209-219
  5. Advanced Topics

    1. Front Matter
      Pages 239-239
    2. Alin Dobra, Minos Garofalakis, Johannes Gehrke, Rajeev Rastogi
      Pages 241-261
    3. S. Muthukrishnan, Martin Strauss
      Pages 263-281
    4. Graham Cormode, Piotr Indyk
      Pages 283-300
    5. Minos Garofalakis
      Pages 301-314
  6. System Architectures and Languages

    1. Front Matter
      Pages 315-315

About this book


We live in the era of “Big Data”: Petabytes of digital information are generated daily, and need to be processed and analyzed for interesting patterns and trends. Besides volume, a defining characteristic of Big Data is its velocity; that is, data is instantiated in the form of continuous, high-speed data streams that arrive at rapid rates, and need to be processed and analyzed on a continuous (24x7) basis. Such data streams pose very difficult challenges for conventional data-management architectures, which are built primarily on the concept of persistent, static data collections. This volume focuses on the theory and practice of data stream management, and the novel challenges this emerging domain poses for data-management algorithms, systems, and applications. The collection of chapters, contributed by authorities in the field, offers a comprehensive introduction to both the algorithmic/theoretical foundations of data streams, as well as the streaming systems and applications built in different domains.

A short introductory chapter provides a brief summary of some basic data streaming concepts and models, and discusses the key elements of a generic stream query processing architecture. Subsequently, Part I focuses on basic streaming algorithms for some key analytics functions (e.g., quantiles, norms, join aggregates, heavy hitters) over streaming data. Part II then examines important techniques for basic stream mining tasks (e.g., clustering, classification, frequent itemsets). Part III discusses a number of advanced topics on stream processing algorithms, and Part IV focuses on system and language aspects of data stream processing with surveys of influential system prototypes and language designs. Part V then presents some representative applications of streaming techniques in different domains (e.g., network management, financial analytics). Finally, the volume concludes with an overview of current data streaming products and new application domains (e.g. cloud computing, big data analytics, and complex event processing), and a discussion of future directions in this exciting field.

The book provides a comprehensive overview of core concepts and technological foundations, as well as various systems and applications, and is of particular interest to students, lecturers and researchers in the area of data stream management.



Data streams Data model extensions Database management system engines Data mining Sensor networks XML query languages

Editors and affiliations

  • Minos Garofalakis
    • 1
  • Johannes Gehrke
    • 2
  • Rajeev Rastogi
    • 3
  1. 1.University Campus - KounoupidianaSchool of ECE, Techn. Univ. of Crete University Campus - KounoupidianaChaniaGreece
  2. 2.Microsoft CorporationRedmondUSA
  3. 3.Amazon India BangaloreIndia

About the editors

Minos Garofalakis is a Professor of Computer Science at the School of Electronic & Computer Engineering of the Technical University of Crete, and the Director of the Software Technology and Network Applications Lab (SoftNet).  Previously, he worked as a Member of Technical Staff at Bell Labs, Lucent Technologies (1998-2005), as a Senior Researcher at Intel Research Berkeley (2005-2007), and as a Principal Research Scientist at Yahoo! Research (2007-2008). In parallel, he also held an Adjunct Associate Professor position at the EECS Department of the University of California, Berkeley (2006-2008). Minos’s research interests include database systems, centralized/distributed data streams, data synopses and approximate query processing, uncertain databases, and big-data analytics and mining. He has published over 140 scientific papers in top-tier international conferences and journals in these areas. His work has resulted in 36 US Patent filings (29 patents issued) for companies such as Lucent, Yahoo!, and AT&T. Minos is an ACM Distinguished Scientist (2011),  and a recipient of the Bell Labs President's Gold Award (2004) and a Marie-Curie International Reintegration Fellowship (2010). 

Johannes Gehrke is a Distinguished Engineer at Microsoft working as an architect and product visionary in the Applications and Services Group. From 1999 to 2015 he was the Tisch University Professor in the Department of Computer Science at Cornell University. Johannes' research interests are in the areas of database systems, data science, and data privacy. Johannes has received a National Science Foundation Career Award, an Arthur P. Sloan Fellowship, an IBM Faculty Award, the Cornell College of Engineering James and Mary Tien Excellence in Teaching Award, the Cornell University Provost's Award for Distinguished Scholarship, a Humboldt Research Award from the Alexander von Humboldt Foundation, the 2011 IEEE Computer Society Technical Achievement Award, and the 2011 Blavatnik Award for Young Scientists from the New York Academy of Sciences. He co-authored the undergraduate textbook Database Management Systems (currently in its third edition), used at universities all over the world. Johannes was Program co-Chair of ACM SIGKDD 2004, VLDB 2007, ICDE 2012, SOCC 2014, and ICDE 2015.

Rajeev Rastogi  is the Director of Machine Learning at Amazon. Previously, he was Vice President of Yahoo! Labs Bangalore and the founding Director of the Bell Labs Research Center in Bangalore, India. Rajeev is an ACM Fellow and a Bell Labs Fellow. He is active in the fields of databases, data mining, and networking, and has served on the program committees of several conferences in these areas. He currently serves on the editorial board of CACM, and has been an Associate editor for IEEE Transactions on Knowledge and Data Engineering in the past. He has published over 125 papers, and holds over 50 patents.


Bibliographic information

Industry Sectors
Chemical Manufacturing
IT & Software
Consumer Packaged Goods
Materials & Steel
Finance, Business & Banking
Energy, Utilities & Environment


“This impressive volume … covers both the theory and algorithms related to processing high-speed data streams. … this is a very useful and well-written text that can be recommended for students, practitioners, and researchers alike. The subject matter fills in a range of details, starting from the basics (including related mathematical theorems) and progressing to describe recent developments along with adequate references to the existing literature and suggestions for future research.” (Paparao Kavalipati, Computing Reviews, January, 2017)