An end-to-end data pipeline benchmark is a standardized suite of data ingestion, data processing, and data queries, arranged in a series of stages, where the output of a previous stage in a pipeline feeds the next stage in pipeline, exercising all the needed system characteristics for commonly constructed data pipeline workloads.
As we witness the rapid transformation in data architecture, where relational database management systems (RDBMS) are being supplemented by large-scale non-relational stores such as Hadoop Distributed File System (HDFS), MongoDB, Apache Cassandra, and Apache HBase, a more fundamental shift is on its way, which would require larger changes to modern data architectures. While the current shift was mandated by business requirements for the connected world, the next wave will be dictated by operational cost optimization, transformative changes in the underlying infrastructure...
- Ghazal A, Rabl T, Hu M, Raab F, Poess M, Crolotte A, Jacobsen HA (2013) Bigbench: towards an industry standard benchmark for big data analytics. In Proceedings of the 2013 ACM SIGMOD international conference on management of data, SIGMOD’13, ACM, New York, pp 1197–1208Google Scholar