Encyclopedia of Big Data Technologies

2019 Edition
| Editors: Sherif Sakr, Albert Y. Zomaya


  • Barzan MozafariEmail author
Reference work entry
DOI: https://doi.org/10.1007/978-3-319-77525-8_258


An increasing number of enterprise applications, particularly those in financial trading and IoT (Internet of Things), produce mixed workloads with all of the following: (1) continuous stream processing, (2) online transaction processing (OLTP), and (3) online analytical processing (OLAP). These applications need to simultaneously consume high-velocity streams to trigger real-time alerts, ingest them into a write-optimized transactional store, and perform analytics to derive deep insight quickly. Despite a flurry of data management solutions designed for one or two of these tasks, there is no single solution that is apt for all three.

SQL-on-Hadoop solutions (e.g., Hive, Impala/Kudu and SparkSQL) use OLAP-style optimizations and columnar formats to run OLAP queries over massive volumes of static data. While apt for batch processing, these systems are not designed as real-time operational databases, as they lack the ability to mutate data with transactional consistency, to...

This is a preview of subscription content, log in to check access.


  1. Abadi D et al (2013) The design and implementation of modern column-oriented database systems. Found Trends Databases 5(3):197–280CrossRefGoogle Scholar
  2. Agarwal S, Panda A, Mozafari B, Iyer AP, Madden S, Stoica I (2012) Blink and it’s done: interactive queries on very large data. In: PVLDBCrossRefGoogle Scholar
  3. Agarwal S, Mozafari B, Panda A, Milner H, Madden S, Stoica I (2013) BlinkDB: queries with bounded errors and bounded response times on very large data. In: EuroSysCrossRefGoogle Scholar
  4. Agarwal S, Milner H, Kleiner A, Talwalkar A, Jordan M, Madden S, Mozafari B, Stoica I (2014) Knowing when you’re wrong: building fast and reliable approximate query processing systems. In: SIGMODCrossRefGoogle Scholar
  5. Akidau T et al (2013) MillWheel: fault-tolerant stream processing at internet scale. In: PVLDBCrossRefGoogle Scholar
  6. Akidau T et al (2015) The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. In: PVLDBCrossRefGoogle Scholar
  7. Al-Kateb M, Lee BS (2010) Stratified reservoir sampling over heterogeneous data streams. In: SSDBMCrossRefGoogle Scholar
  8. Barber R, Huras M, Lohman G, Mohan C, Mueller R, Özcan F, Pirahesh H, Raman V, Sidle R, Sidorkin O et al (2016) Wildfire: concurrent blazing data ingest and analytics. In: SIGMODCrossRefGoogle Scholar
  9. Braun L et al (2015) Analytics in motion: high performance event-processing and real-time analytics in the same database. In: SIGMODCrossRefGoogle Scholar
  10. Chandramouli B et al (2014) Trill: a high-performance incremental query processor for diverse analytics. In: PVLDBGoogle Scholar
  11. Chandrasekaran S et al (2003) TelegraphCQ: continuous dataflow processing. In: SIGMODCrossRefGoogle Scholar
  12. Chaudhuri S, Das G, Narasayya V (2007) Optimized stratified sampling for approximate query processing. ACM Trans Database Syst 32(2):9CrossRefGoogle Scholar
  13. CliffGuard. A general framework for robust and efficient database optimization. http://www.cliffguard.org
  14. Exactly-once processing with trident – the fake truth. https://www.alooma.com/blog/trident-exactly-once
  15. Fernandez RC et al (2015) Liquid: unifying nearline and offline big data integration. In: CIDRGoogle Scholar
  16. Gray J, Lamport L (2006) Consensus on transaction commit. ACM Trans Database Syst 31(1): 133–160CrossRefGoogle Scholar
  17. He W, Park Y, Hanafi I, Yatvitskiy J, Mozafari B (2018) Demonstration of VerdictDB, the platform-independent AQP system. In: SIGMODCrossRefGoogle Scholar
  18. Helland P (2007) Life beyond distributed transactions: an apostate’s opinion. In: CIDRGoogle Scholar
  19. Huang J, Mozafari B, Schoenebeck G, Wenisch T (2017) A top-down approach to achieving performance predictability in database systems. In: SIGMODCrossRefGoogle Scholar
  20. IBM. InfoSphere BigInsights. http://tinyurl.com/ouphdss
  21. Indexedrdd for apache spark. https://github.com/amplab/spark-indexedrdd
  22. Liarou E et al (2012) Monetdb/datacell: online analytics in a streaming column-store. In: PVLDBCrossRefGoogle Scholar
  23. Makin’ Bacon and the Three Main Classes of IoT Analytics. http://tinyurl.com/zlc6den
  24. Meehan J et al (2015) S-store: streaming meets transaction processing. In: PVLDBCrossRefGoogle Scholar
  25. Mozafari B (2017) Approximate query engines: commercial challenges and research opportunities. In: SIGMODCrossRefGoogle Scholar
  26. Mozafari B, Zaniolo C (2010) Optimal load shedding with aggregates and mining queries. In: ICDECrossRefGoogle Scholar
  27. Mozafari B, Niu N (2015) A handbook for building an approximate query engine. IEEE Data Eng Bull 38(3):3–29Google Scholar
  28. Mozafari B, Zeng K, Zaniolo C (2012) High-performance complex event processing over xml streams. In: SIGMODzbMATHCrossRefGoogle Scholar
  29. Mozafari B, Ye Goh EZ, Yoon DY (2015) CliffGuard: a principled framework for finding robust database designs. In: SIGMODCrossRefGoogle Scholar
  30. Mozafari B, Ramnarayan J, Menon S, Mahajan Y, Chakraborty S, Bhanawat H, Bachhav K (2017) SnappyData: a unified cluster for streaming, transactions, and interactive analytics. In: CIDRGoogle Scholar
  31. Ousterhout K et al (2015) Making sense of performance in data analytics frameworks. In: NSDIGoogle Scholar
  32. Park Y, Cafarella M, Mozafari B (2016) Visualization-aware sampling for very large databases. In: ICDECrossRefGoogle Scholar
  33. Park Y, Tajik AS, Cafarella M, Mozafari B (2017) Database learning: towards a database that becomes smarter every time. In: SIGMODCrossRefGoogle Scholar
  34. Park Y, Mozafari B, Sorenson J, Wang J (2018) VerdictDB: universalizing approximate query processing. In: SIGMODCrossRefGoogle Scholar
  35. Ramnarayan J, Mozafari B, Menon S, Wale S, Kumar N, Bhanawat H, Chakraborty S, Mahajan Y, Mishra R, Bachhav K (2016) SnappyData: a hybrid transactional analytical store built on spark. In: SIGMODGoogle Scholar
  36. SnappyData (2016) Streaming, transactions, and interactive analytics in a unified engine. http://web.eecs.umich.edu/~mozafari/php/data/uploads/snappy.pdf
  37. Thakkar H, Laptev N, Mousavi H, Mozafari B, Russo V, Zaniolo C (2011) SMM: a data stream management system for knowledge discovery. In: ICDEGoogle Scholar
  38. TIBCO. StreamBase. http://www.streambase.com/
  39. Tian B, Huang J, Mozafari B, Schoenebeck G, Wenisch T (2018) Contention-aware lock scheduling for transactional databases. In: PVLDBGoogle Scholar
  40. Toshniwal A, Taneja S, Shukla A, Ramasamy K, Patel JM, Kulkarni S, Jackson J, Gade K, Fu M, Donham J, Bhagat N, Mittal S, Ryaboy D (2014) Storm@twitter. In: SIGMODCrossRefGoogle Scholar
  41. Vitter JS (1985) Random sampling with a reservoir. ACM Trans Math Softw 11(1):37–57MathSciNetzbMATHCrossRefGoogle Scholar
  42. Xin R, Rosen J. Project Tungsten: bringing Spark closer to bare metal. http://tinyurl.com/mzw7hew
  43. Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDIGoogle Scholar
  44. Zamanian E, Binnig C, Salama A (2015) Locality-aware partitioning in parallel database systems. In: SIGMODCrossRefGoogle Scholar
  45. Zeng K, Gao S, Gu J, Mozafari B, Zaniolo C (2014a) ABS: a system for scalable approximate queries with accuracy guarantees. In: SIGMODCrossRefGoogle Scholar
  46. Zeng K, Gao S, Mozafari B, Zaniolo C (2014b) The analytical bootstrap: a new method for fast error estimation in approximate query processing. In: SIGMODCrossRefGoogle Scholar
  47. Zeng K, Agarwal S, Dave A, Armbrust M, Stoica I (2015) G-OLA: generalized on-line aggregation for interactive analysis on big data. In: SIGMODCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of MichiganAnn ArborUSA