SnappyData

Mozafari, Barzan

doi:10.1007/978-3-319-77525-8_258

Barzan Mozafari³

8 Accesses
2 Citations

Introduction

An increasing number of enterprise applications, particularly those in financial trading and IoT (Internet of Things), produce mixed workloads with all of the following: (1) continuous stream processing, (2) online transaction processing (OLTP), and (3) online analytical processing (OLAP). These applications need to simultaneously consume high-velocity streams to trigger real-time alerts, ingest them into a write-optimized transactional store, and perform analytics to derive deep insight quickly. Despite a flurry of data management solutions designed for one or two of these tasks, there is no single solution that is apt for all three.

SQL-on-Hadoop solutions (e.g., Hive, Impala/Kudu and SparkSQL) use OLAP-style optimizations and columnar formats to run OLAP queries over massive volumes of static data. While apt for batch processing, these systems are not designed as real-time operational databases, as they lack the ability to mutate data with transactional consistency, to...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 849.99; Price excludes VAT (USA)

Hardcover Book: USD 999.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Although IndexedRDD (Indexedrdd for apache spark) offers an updatable key-value store (Indexedrdd for apache spark), it does not support colocation for high-rate ingestions or distributed transactions. It is also unsuitable for HA, as it relies on disk-based checkpoints for fault tolerance.

References

Abadi D et al (2013) The design and implementation of modern column-oriented database systems. Found Trends Databases 5(3):197–280
Article Google Scholar
Agarwal S, Panda A, Mozafari B, Iyer AP, Madden S, Stoica I (2012) Blink and it’s done: interactive queries on very large data. In: PVLDB
Book Google Scholar
Agarwal S, Mozafari B, Panda A, Milner H, Madden S, Stoica I (2013) BlinkDB: queries with bounded errors and bounded response times on very large data. In: EuroSys
Book Google Scholar
Agarwal S, Milner H, Kleiner A, Talwalkar A, Jordan M, Madden S, Mozafari B, Stoica I (2014) Knowing when you’re wrong: building fast and reliable approximate query processing systems. In: SIGMOD
Book Google Scholar
Akidau T et al (2013) MillWheel: fault-tolerant stream processing at internet scale. In: PVLDB
Book Google Scholar
Akidau T et al (2015) The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. In: PVLDB
Book Google Scholar
Al-Kateb M, Lee BS (2010) Stratified reservoir sampling over heterogeneous data streams. In: SSDBM
Book Google Scholar
Apache Geode. http://geode.incubator.apache.org/
Apache Samza. http://samza.apache.org/
Barber R, Huras M, Lohman G, Mohan C, Mueller R, Özcan F, Pirahesh H, Raman V, Sidle R, Sidorkin O et al (2016) Wildfire: concurrent blazing data ingest and analytics. In: SIGMOD
Book Google Scholar
Braun L et al (2015) Analytics in motion: high performance event-processing and real-time analytics in the same database. In: SIGMOD
Book Google Scholar
Chandramouli B et al (2014) Trill: a high-performance incremental query processor for diverse analytics. In: PVLDB
Google Scholar
Chandrasekaran S et al (2003) TelegraphCQ: continuous dataflow processing. In: SIGMOD
Book Google Scholar
Chaudhuri S, Das G, Narasayya V (2007) Optimized stratified sampling for approximate query processing. ACM Trans Database Syst 32(2):9
Article Google Scholar
CliffGuard. A general framework for robust and efficient database optimization. http://www.cliffguard.org
Exactly-once processing with trident – the fake truth. https://www.alooma.com/blog/trident-exactly-once
Fernandez RC et al (2015) Liquid: unifying nearline and offline big data integration. In: CIDR
Google Scholar
Gray J, Lamport L (2006) Consensus on transaction commit. ACM Trans Database Syst 31(1): 133–160
Article Google Scholar
He W, Park Y, Hanafi I, Yatvitskiy J, Mozafari B (2018) Demonstration of VerdictDB, the platform-independent AQP system. In: SIGMOD
Book Google Scholar
Helland P (2007) Life beyond distributed transactions: an apostate’s opinion. In: CIDR
Google Scholar
Huang J, Mozafari B, Schoenebeck G, Wenisch T (2017) A top-down approach to achieving performance predictability in database systems. In: SIGMOD
Book Google Scholar
IBM. InfoSphere BigInsights. http://tinyurl.com/ouphdss
Indexedrdd for apache spark. https://github.com/amplab/spark-indexedrdd
Liarou E et al (2012) Monetdb/datacell: online analytics in a streaming column-store. In: PVLDB
Book Google Scholar
Makin’ Bacon and the Three Main Classes of IoT Analytics. http://tinyurl.com/zlc6den
Meehan J et al (2015) S-store: streaming meets transaction processing. In: PVLDB
Book Google Scholar
Mozafari B (2017) Approximate query engines: commercial challenges and research opportunities. In: SIGMOD
Book Google Scholar
Mozafari B, Zaniolo C (2010) Optimal load shedding with aggregates and mining queries. In: ICDE
Book Google Scholar
Mozafari B, Niu N (2015) A handbook for building an approximate query engine. IEEE Data Eng Bull 38(3):3–29
Google Scholar
Mozafari B, Zeng K, Zaniolo C (2012) High-performance complex event processing over xml streams. In: SIGMOD
Book MATH Google Scholar
Mozafari B, Ye Goh EZ, Yoon DY (2015) CliffGuard: a principled framework for finding robust database designs. In: SIGMOD
Book Google Scholar
Mozafari B, Ramnarayan J, Menon S, Mahajan Y, Chakraborty S, Bhanawat H, Bachhav K (2017) SnappyData: a unified cluster for streaming, transactions, and interactive analytics. In: CIDR
Google Scholar
Ousterhout K et al (2015) Making sense of performance in data analytics frameworks. In: NSDI
Google Scholar
Park Y, Cafarella M, Mozafari B (2016) Visualization-aware sampling for very large databases. In: ICDE
Book Google Scholar
Park Y, Tajik AS, Cafarella M, Mozafari B (2017) Database learning: towards a database that becomes smarter every time. In: SIGMOD
Book Google Scholar
Park Y, Mozafari B, Sorenson J, Wang J (2018) VerdictDB: universalizing approximate query processing. In: SIGMOD
Book Google Scholar
Ramnarayan J, Mozafari B, Menon S, Wale S, Kumar N, Bhanawat H, Chakraborty S, Mahajan Y, Mishra R, Bachhav K (2016) SnappyData: a hybrid transactional analytical store built on spark. In: SIGMOD
Google Scholar
SnappyData (2016) Streaming, transactions, and interactive analytics in a unified engine. http://web.eecs.umich.edu/~mozafari/php/data/uploads/snappy.pdf
Thakkar H, Laptev N, Mousavi H, Mozafari B, Russo V, Zaniolo C (2011) SMM: a data stream management system for knowledge discovery. In: ICDE
Google Scholar
TIBCO. StreamBase. http://www.streambase.com/
Tian B, Huang J, Mozafari B, Schoenebeck G, Wenisch T (2018) Contention-aware lock scheduling for transactional databases. In: PVLDB
Google Scholar
Toshniwal A, Taneja S, Shukla A, Ramasamy K, Patel JM, Kulkarni S, Jackson J, Gade K, Fu M, Donham J, Bhagat N, Mittal S, Ryaboy D (2014) Storm@twitter. In: SIGMOD
Book Google Scholar
Vitter JS (1985) Random sampling with a reservoir. ACM Trans Math Softw 11(1):37–57
Article MathSciNet MATH Google Scholar
Xin R, Rosen J. Project Tungsten: bringing Spark closer to bare metal. http://tinyurl.com/mzw7hew
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI
Google Scholar
Zamanian E, Binnig C, Salama A (2015) Locality-aware partitioning in parallel database systems. In: SIGMOD
Book Google Scholar
Zeng K, Gao S, Gu J, Mozafari B, Zaniolo C (2014a) ABS: a system for scalable approximate queries with accuracy guarantees. In: SIGMOD
Book Google Scholar
Zeng K, Gao S, Mozafari B, Zaniolo C (2014b) The analytical bootstrap: a new method for fast error estimation in approximate query processing. In: SIGMOD
Book Google Scholar
Zeng K, Agarwal S, Dave A, Armbrust M, Stoica I (2015) G-OLA: generalized on-line aggregation for interactive analysis on big data. In: SIGMOD
Book Google Scholar

Download references

Author information

Authors and Affiliations

University of Michigan, Ann Arbor, MI, USA
Barzan Mozafari

Authors

Barzan Mozafari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Barzan Mozafari .

Editor information

Editors and Affiliations

Institute of Computer Science, University of Tartu, Tartu, Estonia
Sherif Sakr
School of Information Technologies, Sydney University, Sydney, Australia
Albert Y. Zomaya

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Mozafari, B. (2019). SnappyData. In: Sakr, S., Zomaya, A.Y. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-77525-8_258

Download citation

DOI: https://doi.org/10.1007/978-3-319-77525-8_258
Published: 20 February 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77524-1
Online ISBN: 978-3-319-77525-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics