Skip to main content

SnappyData

  • Reference work entry
  • First Online:

Introduction

An increasing number of enterprise applications, particularly those in financial trading and IoT (Internet of Things), produce mixed workloads with all of the following: (1) continuous stream processing, (2) online transaction processing (OLTP), and (3) online analytical processing (OLAP). These applications need to simultaneously consume high-velocity streams to trigger real-time alerts, ingest them into a write-optimized transactional store, and perform analytics to derive deep insight quickly. Despite a flurry of data management solutions designed for one or two of these tasks, there is no single solution that is apt for all three.

SQL-on-Hadoop solutions (e.g., Hive, Impala/Kudu and SparkSQL) use OLAP-style optimizations and columnar formats to run OLAP queries over massive volumes of static data. While apt for batch processing, these systems are not designed as real-time operational databases, as they lack the ability to mutate data with transactional consistency, to...

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   849.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   999.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Although IndexedRDD (Indexedrdd for apache spark) offers an updatable key-value store (Indexedrdd for apache spark), it does not support colocation for high-rate ingestions or distributed transactions. It is also unsuitable for HA, as it relies on disk-based checkpoints for fault tolerance.

References

  • Abadi D et al (2013) The design and implementation of modern column-oriented database systems. Found Trends Databases 5(3):197–280

    Article  Google Scholar 

  • Agarwal S, Panda A, Mozafari B, Iyer AP, Madden S, Stoica I (2012) Blink and it’s done: interactive queries on very large data. In: PVLDB

    Book  Google Scholar 

  • Agarwal S, Mozafari B, Panda A, Milner H, Madden S, Stoica I (2013) BlinkDB: queries with bounded errors and bounded response times on very large data. In: EuroSys

    Book  Google Scholar 

  • Agarwal S, Milner H, Kleiner A, Talwalkar A, Jordan M, Madden S, Mozafari B, Stoica I (2014) Knowing when you’re wrong: building fast and reliable approximate query processing systems. In: SIGMOD

    Book  Google Scholar 

  • Akidau T et al (2013) MillWheel: fault-tolerant stream processing at internet scale. In: PVLDB

    Book  Google Scholar 

  • Akidau T et al (2015) The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. In: PVLDB

    Book  Google Scholar 

  • Al-Kateb M, Lee BS (2010) Stratified reservoir sampling over heterogeneous data streams. In: SSDBM

    Book  Google Scholar 

  • Apache Geode. http://geode.incubator.apache.org/

  • Apache Samza. http://samza.apache.org/

  • Barber R, Huras M, Lohman G, Mohan C, Mueller R, Özcan F, Pirahesh H, Raman V, Sidle R, Sidorkin O et al (2016) Wildfire: concurrent blazing data ingest and analytics. In: SIGMOD

    Book  Google Scholar 

  • Braun L et al (2015) Analytics in motion: high performance event-processing and real-time analytics in the same database. In: SIGMOD

    Book  Google Scholar 

  • Chandramouli B et al (2014) Trill: a high-performance incremental query processor for diverse analytics. In: PVLDB

    Google Scholar 

  • Chandrasekaran S et al (2003) TelegraphCQ: continuous dataflow processing. In: SIGMOD

    Book  Google Scholar 

  • Chaudhuri S, Das G, Narasayya V (2007) Optimized stratified sampling for approximate query processing. ACM Trans Database Syst 32(2):9

    Article  Google Scholar 

  • CliffGuard. A general framework for robust and efficient database optimization. http://www.cliffguard.org

  • Exactly-once processing with trident – the fake truth. https://www.alooma.com/blog/trident-exactly-once

  • Fernandez RC et al (2015) Liquid: unifying nearline and offline big data integration. In: CIDR

    Google Scholar 

  • Gray J, Lamport L (2006) Consensus on transaction commit. ACM Trans Database Syst 31(1): 133–160

    Article  Google Scholar 

  • He W, Park Y, Hanafi I, Yatvitskiy J, Mozafari B (2018) Demonstration of VerdictDB, the platform-independent AQP system. In: SIGMOD

    Book  Google Scholar 

  • Helland P (2007) Life beyond distributed transactions: an apostate’s opinion. In: CIDR

    Google Scholar 

  • Huang J, Mozafari B, Schoenebeck G, Wenisch T (2017) A top-down approach to achieving performance predictability in database systems. In: SIGMOD

    Book  Google Scholar 

  • IBM. InfoSphere BigInsights. http://tinyurl.com/ouphdss

  • Indexedrdd for apache spark. https://github.com/amplab/spark-indexedrdd

  • Liarou E et al (2012) Monetdb/datacell: online analytics in a streaming column-store. In: PVLDB

    Book  Google Scholar 

  • Makin’ Bacon and the Three Main Classes of IoT Analytics. http://tinyurl.com/zlc6den

  • Meehan J et al (2015) S-store: streaming meets transaction processing. In: PVLDB

    Book  Google Scholar 

  • Mozafari B (2017) Approximate query engines: commercial challenges and research opportunities. In: SIGMOD

    Book  Google Scholar 

  • Mozafari B, Zaniolo C (2010) Optimal load shedding with aggregates and mining queries. In: ICDE

    Book  Google Scholar 

  • Mozafari B, Niu N (2015) A handbook for building an approximate query engine. IEEE Data Eng Bull 38(3):3–29

    Google Scholar 

  • Mozafari B, Zeng K, Zaniolo C (2012) High-performance complex event processing over xml streams. In: SIGMOD

    Book  MATH  Google Scholar 

  • Mozafari B, Ye Goh EZ, Yoon DY (2015) CliffGuard: a principled framework for finding robust database designs. In: SIGMOD

    Book  Google Scholar 

  • Mozafari B, Ramnarayan J, Menon S, Mahajan Y, Chakraborty S, Bhanawat H, Bachhav K (2017) SnappyData: a unified cluster for streaming, transactions, and interactive analytics. In: CIDR

    Google Scholar 

  • Ousterhout K et al (2015) Making sense of performance in data analytics frameworks. In: NSDI

    Google Scholar 

  • Park Y, Cafarella M, Mozafari B (2016) Visualization-aware sampling for very large databases. In: ICDE

    Book  Google Scholar 

  • Park Y, Tajik AS, Cafarella M, Mozafari B (2017) Database learning: towards a database that becomes smarter every time. In: SIGMOD

    Book  Google Scholar 

  • Park Y, Mozafari B, Sorenson J, Wang J (2018) VerdictDB: universalizing approximate query processing. In: SIGMOD

    Book  Google Scholar 

  • Ramnarayan J, Mozafari B, Menon S, Wale S, Kumar N, Bhanawat H, Chakraborty S, Mahajan Y, Mishra R, Bachhav K (2016) SnappyData: a hybrid transactional analytical store built on spark. In: SIGMOD

    Google Scholar 

  • SnappyData (2016) Streaming, transactions, and interactive analytics in a unified engine. http://web.eecs.umich.edu/~mozafari/php/data/uploads/snappy.pdf

  • Thakkar H, Laptev N, Mousavi H, Mozafari B, Russo V, Zaniolo C (2011) SMM: a data stream management system for knowledge discovery. In: ICDE

    Google Scholar 

  • TIBCO. StreamBase. http://www.streambase.com/

  • Tian B, Huang J, Mozafari B, Schoenebeck G, Wenisch T (2018) Contention-aware lock scheduling for transactional databases. In: PVLDB

    Google Scholar 

  • Toshniwal A, Taneja S, Shukla A, Ramasamy K, Patel JM, Kulkarni S, Jackson J, Gade K, Fu M, Donham J, Bhagat N, Mittal S, Ryaboy D (2014) Storm@twitter. In: SIGMOD

    Book  Google Scholar 

  • Vitter JS (1985) Random sampling with a reservoir. ACM Trans Math Softw 11(1):37–57

    Article  MathSciNet  MATH  Google Scholar 

  • Xin R, Rosen J. Project Tungsten: bringing Spark closer to bare metal. http://tinyurl.com/mzw7hew

  • Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI

    Google Scholar 

  • Zamanian E, Binnig C, Salama A (2015) Locality-aware partitioning in parallel database systems. In: SIGMOD

    Book  Google Scholar 

  • Zeng K, Gao S, Gu J, Mozafari B, Zaniolo C (2014a) ABS: a system for scalable approximate queries with accuracy guarantees. In: SIGMOD

    Book  Google Scholar 

  • Zeng K, Gao S, Mozafari B, Zaniolo C (2014b) The analytical bootstrap: a new method for fast error estimation in approximate query processing. In: SIGMOD

    Book  Google Scholar 

  • Zeng K, Agarwal S, Dave A, Armbrust M, Stoica I (2015) G-OLA: generalized on-line aggregation for interactive analysis on big data. In: SIGMOD

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Barzan Mozafari .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Mozafari, B. (2019). SnappyData. In: Sakr, S., Zomaya, A.Y. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-77525-8_258

Download citation

Publish with us

Policies and ethics