Skip to main content

SnappyData

  • Reference work entry
  • First Online:
Encyclopedia of Big Data Technologies

Introduction

An increasing number of enterprise applications, particularly those in financial trading and IoT (Internet of Things), produce mixed workloads with all of the following: (1) continuous stream processing, (2) online transaction processing (OLTP), and (3) online analytical processing (OLAP). These applications need to simultaneously consume high-velocity streams to trigger real-time alerts, ingest them into a write-optimized transactional store, and perform analytics to derive deep insight quickly. Despite a flurry of data management solutions designed for one or two of these tasks, there is no single solution that is apt for all three.

SQL-on-Hadoop solutions (e.g., Hive, Impala/Kudu and SparkSQL) use OLAP-style optimizations and columnar formats to run OLAP queries over massive volumes of static data. While apt for batch processing, these systems are not designed as real-time operational databases, as they lack the ability to mutate data with transactional consistency, to...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 849.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 999.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Although IndexedRDD (Indexedrdd for apache spark) offers an updatable key-value store (Indexedrdd for apache spark), it does not support colocation for high-rate ingestions or distributed transactions. It is also unsuitable for HA, as it relies on disk-based checkpoints for fault tolerance.

References

  • Abadi D et al (2013) The design and implementation of modern column-oriented database systems. Found Trends Databases 5(3):197–280

    Article  Google Scholar 

  • Agarwal S, Panda A, Mozafari B, Iyer AP, Madden S, Stoica I (2012) Blink and it’s done: interactive queries on very large data. In: PVLDB

    Book  Google Scholar 

  • Agarwal S, Mozafari B, Panda A, Milner H, Madden S, Stoica I (2013) BlinkDB: queries with bounded errors and bounded response times on very large data. In: EuroSys

    Book  Google Scholar 

  • Agarwal S, Milner H, Kleiner A, Talwalkar A, Jordan M, Madden S, Mozafari B, Stoica I (2014) Knowing when you’re wrong: building fast and reliable approximate query processing systems. In: SIGMOD

    Book  Google Scholar 

  • Akidau T et al (2013) MillWheel: fault-tolerant stream processing at internet scale. In: PVLDB

    Book  Google Scholar 

  • Akidau T et al (2015) The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. In: PVLDB

    Book  Google Scholar 

  • Al-Kateb M, Lee BS (2010) Stratified reservoir sampling over heterogeneous data streams. In: SSDBM

    Book  Google Scholar 

  • Apache Geode. http://geode.incubator.apache.org/

  • Apache Samza. http://samza.apache.org/

  • Barber R, Huras M, Lohman G, Mohan C, Mueller R, Özcan F, Pirahesh H, Raman V, Sidle R, Sidorkin O et al (2016) Wildfire: concurrent blazing data ingest and analytics. In: SIGMOD

    Book  Google Scholar 

  • Braun L et al (2015) Analytics in motion: high performance event-processing and real-time analytics in the same database. In: SIGMOD

    Book  Google Scholar 

  • Chandramouli B et al (2014) Trill: a high-performance incremental query processor for diverse analytics. In: PVLDB

    Google Scholar 

  • Chandrasekaran S et al (2003) TelegraphCQ: continuous dataflow processing. In: SIGMOD

    Book  Google Scholar 

  • Chaudhuri S, Das G, Narasayya V (2007) Optimized stratified sampling for approximate query processing. ACM Trans Database Syst 32(2):9

    Article  Google Scholar 

  • CliffGuard. A general framework for robust and efficient database optimization. http://www.cliffguard.org

  • Exactly-once processing with trident – the fake truth. https://www.alooma.com/blog/trident-exactly-once

  • Fernandez RC et al (2015) Liquid: unifying nearline and offline big data integration. In: CIDR

    Google Scholar 

  • Gray J, Lamport L (2006) Consensus on transaction commit. ACM Trans Database Syst 31(1): 133–160

    Article  Google Scholar 

  • He W, Park Y, Hanafi I, Yatvitskiy J, Mozafari B (2018) Demonstration of VerdictDB, the platform-independent AQP system. In: SIGMOD

    Book  Google Scholar 

  • Helland P (2007) Life beyond distributed transactions: an apostate’s opinion. In: CIDR

    Google Scholar 

  • Huang J, Mozafari B, Schoenebeck G, Wenisch T (2017) A top-down approach to achieving performance predictability in database systems. In: SIGMOD

    Book  Google Scholar 

  • IBM. InfoSphere BigInsights. http://tinyurl.com/ouphdss

  • Indexedrdd for apache spark. https://github.com/amplab/spark-indexedrdd

  • Liarou E et al (2012) Monetdb/datacell: online analytics in a streaming column-store. In: PVLDB

    Book  Google Scholar 

  • Makin’ Bacon and the Three Main Classes of IoT Analytics. http://tinyurl.com/zlc6den

  • Meehan J et al (2015) S-store: streaming meets transaction processing. In: PVLDB

    Book  Google Scholar 

  • Mozafari B (2017) Approximate query engines: commercial challenges and research opportunities. In: SIGMOD

    Book  Google Scholar 

  • Mozafari B, Zaniolo C (2010) Optimal load shedding with aggregates and mining queries. In: ICDE

    Book  Google Scholar 

  • Mozafari B, Niu N (2015) A handbook for building an approximate query engine. IEEE Data Eng Bull 38(3):3–29

    Google Scholar 

  • Mozafari B, Zeng K, Zaniolo C (2012) High-performance complex event processing over xml streams. In: SIGMOD

    Book  MATH  Google Scholar 

  • Mozafari B, Ye Goh EZ, Yoon DY (2015) CliffGuard: a principled framework for finding robust database designs. In: SIGMOD

    Book  Google Scholar 

  • Mozafari B, Ramnarayan J, Menon S, Mahajan Y, Chakraborty S, Bhanawat H, Bachhav K (2017) SnappyData: a unified cluster for streaming, transactions, and interactive analytics. In: CIDR

    Google Scholar 

  • Ousterhout K et al (2015) Making sense of performance in data analytics frameworks. In: NSDI

    Google Scholar 

  • Park Y, Cafarella M, Mozafari B (2016) Visualization-aware sampling for very large databases. In: ICDE

    Book  Google Scholar 

  • Park Y, Tajik AS, Cafarella M, Mozafari B (2017) Database learning: towards a database that becomes smarter every time. In: SIGMOD

    Book  Google Scholar 

  • Park Y, Mozafari B, Sorenson J, Wang J (2018) VerdictDB: universalizing approximate query processing. In: SIGMOD

    Book  Google Scholar 

  • Ramnarayan J, Mozafari B, Menon S, Wale S, Kumar N, Bhanawat H, Chakraborty S, Mahajan Y, Mishra R, Bachhav K (2016) SnappyData: a hybrid transactional analytical store built on spark. In: SIGMOD

    Google Scholar 

  • SnappyData (2016) Streaming, transactions, and interactive analytics in a unified engine. http://web.eecs.umich.edu/~mozafari/php/data/uploads/snappy.pdf

  • Thakkar H, Laptev N, Mousavi H, Mozafari B, Russo V, Zaniolo C (2011) SMM: a data stream management system for knowledge discovery. In: ICDE

    Google Scholar 

  • TIBCO. StreamBase. http://www.streambase.com/

  • Tian B, Huang J, Mozafari B, Schoenebeck G, Wenisch T (2018) Contention-aware lock scheduling for transactional databases. In: PVLDB

    Google Scholar 

  • Toshniwal A, Taneja S, Shukla A, Ramasamy K, Patel JM, Kulkarni S, Jackson J, Gade K, Fu M, Donham J, Bhagat N, Mittal S, Ryaboy D (2014) Storm@twitter. In: SIGMOD

    Book  Google Scholar 

  • Vitter JS (1985) Random sampling with a reservoir. ACM Trans Math Softw 11(1):37–57

    Article  MathSciNet  MATH  Google Scholar 

  • Xin R, Rosen J. Project Tungsten: bringing Spark closer to bare metal. http://tinyurl.com/mzw7hew

  • Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI

    Google Scholar 

  • Zamanian E, Binnig C, Salama A (2015) Locality-aware partitioning in parallel database systems. In: SIGMOD

    Book  Google Scholar 

  • Zeng K, Gao S, Gu J, Mozafari B, Zaniolo C (2014a) ABS: a system for scalable approximate queries with accuracy guarantees. In: SIGMOD

    Book  Google Scholar 

  • Zeng K, Gao S, Mozafari B, Zaniolo C (2014b) The analytical bootstrap: a new method for fast error estimation in approximate query processing. In: SIGMOD

    Book  Google Scholar 

  • Zeng K, Agarwal S, Dave A, Armbrust M, Stoica I (2015) G-OLA: generalized on-line aggregation for interactive analysis on big data. In: SIGMOD

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Barzan Mozafari .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Mozafari, B. (2019). SnappyData. In: Sakr, S., Zomaya, A.Y. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-77525-8_258

Download citation

Publish with us

Policies and ethics