Abstract
Economy and research increasingly depend on the timely analysis of large datasets to guide decision making. Complex analysis often involve a rich variety of data types and special purpose processing models. We believe, the database system of the future will use compilation techniques to translate specialized and abstract high level programming models into scalable low level operations on efficient physical data formats. We currently envision optimized relational and linear algebra languages, a flexible data flow language(A language inspired by the programming models of popular data flow engines like Apache Spark (spark.apache.org) or Apache Flink (flink.apache.org).) and scaleable physical operators and formats for relational and array data types. In this article, we propose a database system architecture that is designed around these ideas and we introduce our prototypical implementation of that architecture.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
In contrast to by record.
- 5.
In practice, scaleability of data intensive workloads is often limited by memory bandwidth.
References
Luong, J., Habich, D., Kissinger, T., Lehner, W.: Architecture of a multi-domain processing and storage engine. In: Proceedings of the 5th International Conference on Data Management Technologies and Applications, DATA, vol. 1, pp. 189–194 (2016)
Aguilera, A., Grunzke, R., Habich, D., Luong, J., Schollbach, D., Markwardt, U., Garcke, J.: Advancing a gateway infrastructure for wind turbine data analysis. J Grid Comput. 14(4), 499–514 (2016)
Rompf, T., Odersky, M.: Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLS. ACM Sigplan Not. 46, 127–136 (2010). ACM
Duggan, J., Elmore, A.J., Stonebraker, M., Balazinska, M., Howe, B., Kepner, J., Madden, S., Maier, D., Mattson, T., Zdonik, S.: The bigdawg polystore system. ACM SIGMOD Rec. 44, 11–16 (2015)
Beckmann, O., Houghton, A., Mellor, M., Kelly, P.H.J.: Runtime code generation in C++ as a foundation for domain-specific optimisation. In: Lengauer, C., Batory, D., Consel, C., Odersky, M. (eds.) Domain-Specific Program Generation. LNCS, vol. 3016, pp. 291–306. Springer, Heidelberg (2004). doi:10.1007/978-3-540-25935-0_17
Newburn, C.J., So, B., Liu, Z., McCool, M., Ghuloum, A., Toit, S.D., Wang, Z.G., Du, Z.H., Chen, Y., Wu, G., et al.: Intel’s array building blocks: a retargetable, dynamic compiler and embedded language. In: 2011 9th annual IEEE/ACM international symposium on Code generation and optimization (CGO), pp. 224–235. IEEE (2011)
Alexandrov, A., Kunft, A., Katsifodimos, A., Schüler, F., Thamsen, L., Kao, O., Herb, T., Markl, V.: Implicit parallelism through deep language embedding. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 47–61. ACM (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Luong, J., Habich, D., Kissinger, T., Lehner, W. (2017). Towards Efficient Multi-domain Data Processing. In: Francalanci, C., Helfert, M. (eds) Data Management Technologies and Applications. DATA 2016. Communications in Computer and Information Science, vol 737. Springer, Cham. https://doi.org/10.1007/978-3-319-62911-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-62911-7_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62910-0
Online ISBN: 978-3-319-62911-7
eBook Packages: Computer ScienceComputer Science (R0)