Towards Efficient Multi-domain Data Processing

Luong, Johannes; Habich, Dirk; Kissinger, Thomas; Lehner, Wolfgang

doi:10.1007/978-3-319-62911-7_3

Johannes Luong¹²,
Dirk Habich¹²,
Thomas Kissinger¹² &
…
Wolfgang Lehner¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 737))

Included in the following conference series:

International Conference on Data Management Technologies and Applications

412 Accesses

Abstract

Economy and research increasingly depend on the timely analysis of large datasets to guide decision making. Complex analysis often involve a rich variety of data types and special purpose processing models. We believe, the database system of the future will use compilation techniques to translate specialized and abstract high level programming models into scalable low level operations on efficient physical data formats. We currently envision optimized relational and linear algebra languages, a flexible data flow language(A language inspired by the programming models of popular data flow engines like Apache Spark (spark.apache.org) or Apache Flink (flink.apache.org).) and scaleable physical operators and formats for relational and array data types. In this article, we propose a database system architecture that is designed around these ideas and we introduce our prototypical implementation of that architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.gnu.org/software/octave/.
2.
https://software.intel.com/en-us/intel-tbb.
3.
https://software.intel.com/en-us/intel-mkl.
4.
In contrast to by record.
5.
In practice, scaleability of data intensive workloads is often limited by memory bandwidth.

References

Luong, J., Habich, D., Kissinger, T., Lehner, W.: Architecture of a multi-domain processing and storage engine. In: Proceedings of the 5th International Conference on Data Management Technologies and Applications, DATA, vol. 1, pp. 189–194 (2016)
Google Scholar
Aguilera, A., Grunzke, R., Habich, D., Luong, J., Schollbach, D., Markwardt, U., Garcke, J.: Advancing a gateway infrastructure for wind turbine data analysis. J Grid Comput. 14(4), 499–514 (2016)
Article Google Scholar
Rompf, T., Odersky, M.: Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLS. ACM Sigplan Not. 46, 127–136 (2010). ACM
Article Google Scholar
Duggan, J., Elmore, A.J., Stonebraker, M., Balazinska, M., Howe, B., Kepner, J., Madden, S., Maier, D., Mattson, T., Zdonik, S.: The bigdawg polystore system. ACM SIGMOD Rec. 44, 11–16 (2015)
Article Google Scholar
Beckmann, O., Houghton, A., Mellor, M., Kelly, P.H.J.: Runtime code generation in C++ as a foundation for domain-specific optimisation. In: Lengauer, C., Batory, D., Consel, C., Odersky, M. (eds.) Domain-Specific Program Generation. LNCS, vol. 3016, pp. 291–306. Springer, Heidelberg (2004). doi:10.1007/978-3-540-25935-0_17
Chapter Google Scholar
Newburn, C.J., So, B., Liu, Z., McCool, M., Ghuloum, A., Toit, S.D., Wang, Z.G., Du, Z.H., Chen, Y., Wu, G., et al.: Intel’s array building blocks: a retargetable, dynamic compiler and embedded language. In: 2011 9th annual IEEE/ACM international symposium on Code generation and optimization (CGO), pp. 224–235. IEEE (2011)
Google Scholar
Alexandrov, A., Kunft, A., Katsifodimos, A., Schüler, F., Thamsen, L., Kao, O., Herb, T., Markl, V.: Implicit parallelism through deep language embedding. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 47–61. ACM (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Database Technology Group, Technische Universität Dresden, 01062, Dresden, Germany
Johannes Luong, Dirk Habich, Thomas Kissinger & Wolfgang Lehner

Authors

Johannes Luong
View author publications
You can also search for this author in PubMed Google Scholar
Dirk Habich
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Kissinger
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Lehner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Johannes Luong .

Editor information

Editors and Affiliations

Department of Electronics and Information, Politecnico di Milano, Milan, Italy
Chiara Francalanci
School of Computing, Dublin City University, Dublin, Ireland
Markus Helfert

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Luong, J., Habich, D., Kissinger, T., Lehner, W. (2017). Towards Efficient Multi-domain Data Processing. In: Francalanci, C., Helfert, M. (eds) Data Management Technologies and Applications. DATA 2016. Communications in Computer and Information Science, vol 737. Springer, Cham. https://doi.org/10.1007/978-3-319-62911-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-62911-7_3
Published: 01 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62910-0
Online ISBN: 978-3-319-62911-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics