Distributed In-Memory Analytics for Big Temporal Data

Yao, Bin; Zhang, Wei; Wang, Zhi-Jie; Chen, Zhongpu; Shang, Shuo; Zheng, Kai; Guo, Minyi

doi:10.1007/978-3-319-91452-7_36

Bin Yao^24,25,
Wei Zhang^24,26,
Zhi-Jie Wang²⁷,
Zhongpu Chen²⁴,
Shuo Shang²⁸,
Kai Zheng²⁹ &
…
Minyi Guo²⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10827))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

3509 Accesses
4 Citations

Abstract

The temporal data is ubiquitous, and massive amount of temporal data is generated nowadays. Management of big temporal data is important yet challenging. Processing big temporal data using a distributed system is a desired choice. However, existing distributed systems/methods either cannot support native queries, or are disk-based solutions, which could not well satisfy the requirements of high throughput and low latency. To alleviate this issue, this paper proposes an In-memory based Two-level Index Solution in Spark (ITISS) for processing big temporal data. The framework of our system is easy to understand and implement, but without loss of efficiency. We conduct extensive experiments to verify the performance of our solution. Experimental results based on both real and synthetic datasets consistently demonstrate that our solution is efficient and competitive.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Postgres 9.2 highlight - range types. http://paquier.xyz/postgresql-2/postgres-9-2-highlight-range-types
Temporal Tables. https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables
Workspace Manager Valid Time Support. https://docs.oracle.com/cd/B28359_01/appdev.111/b28396/long_vt.htm#g1014747
Ahn, I., Snodgrass, R.: Performance evaluation of a temporal database management system. In: SIGMOD (1986)
Article Google Scholar
Becker, B., Gschwind, S., Ohler, T., Seeger, B., Widmayer, B.: An asymptotically optimal multiversion B-tree. VLDBJ (1996)
Google Scholar
Bettini, C., Wang, X.S., Bertino, E., Jajodia, S.: Semantic assumptions and query evaluation in temporal databases. In: SIGMOD (1995)
Google Scholar
Bliujute, R., Jensen, C.S., Saltenis, S., Slivinskas, G.: R-tree based indexing of now-relative bitemporal data. In: VLDB (1998)
Google Scholar
Böhlen, M., Gamper, J., Jensen, C.S.: Multi-dimensional aggregation for temporal data. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Boehm, K., Kemper, A., Grust, T., Boehm, C. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 257–275. Springer, Heidelberg (2006). https://doi.org/10.1007/11687238_18
Chapter Google Scholar
Chandramouli, B., Goldstein, J., Duan, S.: Temporal analytics on big data for web advertising. In: ICDE (2012)
Google Scholar
Cheng, K.: On computing temporal aggregates over null time intervals. In: Benslimane, D., Damiani, E., Grosky, W.I., Hameurlain, A., Sheth, A., Wagner, R.R. (eds.) DEXA 2017. LNCS, vol. 10439, pp. 67–79. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64471-4_7
Chapter Google Scholar
Elmasri, R., Wuu, G.T., Kim, Y.J.: The time index: an access structure for temporal data. In: VLDB (1990)
Google Scholar
Färber, F., et al.: The SAP HANA database-an architecture overview. IEEE Data Eng. Bull. (2012)
Google Scholar
Gao, D., Jensen, S., Snodgrass, R.T., Soo, D.: Join operations in temporal databases. VLDBJ (2005)
Google Scholar
Gendrano, J.A.G., Huang, B.C., Rodrigue, J.M., Moon, B., Snodgrass, R.T., Parallel algorithms for computing temporal aggregates. In: ICDE (1999)
Google Scholar
Gollapudi, S., Sivakumar, D.: Framework and algorithms for trend analysis in massive temporal data sets. In: CIKM (2004)
Google Scholar
Günnemann, S., Kremer, H., Laufkötter, C., Seidl, T.: Tracing evolving subspace clusters in temporal climate data. DMKD 24, 387–410 (2012)
MathSciNet Google Scholar
Gupta, M., Gao, J., Aggarwal, C.C., Han, J.: Outlier detection for temporal data: a survey. TKDE (2014)
Google Scholar
Jensen, C.S., Snodgrass, R.T.: Temporal data management. TKDE (1999)
Google Scholar
Kaufmann, M., Fischer, P.M., May, N., Ge, C., Goel, A.K., Kossmann, D.: Bi-temporal timeline index: a data structure for processing queries on bi-temporal data. In: ICDE (2015)
Google Scholar
Kaufmann, M., Manjili, A.A., Vagenas, P., Fischer, P.M., Kossmann, D., Färber, F., May, N.: Timeline index: a unified data structure for processing queries on temporal data in SAP HANA. In: SIGMOD (2013)
Google Scholar
Kline, N., Snodgrass, R.T.: Computing temporal aggregates. In: ICDE (1995)
Google Scholar
Kollios, G., Tsotras, V.J.: Hashing methods for temporal data. TKDE (2002)
Google Scholar
Le, W., Li, F., Tao, Y., Christensen, R.: Optimal splitters for temporal and multi-version databases. In: SIGMOD (2013)
Google Scholar
Leskovec, J., Krevl, A.: SNAP datasets: stanford large network dataset collection (2014). http://snap.stanford.edu/data
Leung, T.C., Muntz, R.R.: Temporal query processing and optimization in multiprocessor database machines. In: VLDB (1992)
Google Scholar
Li, F., Yi, K., Le, W.: Top-k queries on temporal data. VLDBJ (2010)
Google Scholar
Loglisci, C., Ceci, M., Malerba, D.: A temporal data mining framework for analyzing longitudinal data. In: Hameurlain, A., Liddle, S.W., Schewe, K.-D., Zhou, X. (eds.) DEXA 2011. LNCS, vol. 6861, pp. 97–106. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23091-2_9
Chapter Google Scholar
Lomet, D., et al.: Transaction time support inside a database engine. In: ICDE (2006)
Google Scholar
Ramaswamy, S.: Efficient indexing for constraint and temporal databases. In: Afrati, F., Kolaitis, P. (eds.) ICDT 1997. LNCS, vol. 1186, pp. 419–431. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-62222-5_61
Chapter Google Scholar
Roddick, J.F., Spiliopoulou, M.: A survey of temporal knowledge discovery paradigms and methods. TKDE (2002)
Google Scholar
Saracco, C.M., et al.: A matter of time: temporal data management in DB2 10. Technical report, IBM (2012)
Google Scholar
Wang, P., Zhang, P., Zhou, C., Li, Z., Yang, H.: Hierarchical evolving Dirichlet processes for modeling nonlinear evolutionary traces in temporal data. DMKD 31, 32–64 (2017)
MathSciNet MATH Google Scholar
Wang, X.S., Jajodia, S., Subrahmanian, V.: Temporal modules: an approach toward federated temporal databases. In: SIGMOD (1993)
Google Scholar
Xie, D., Li, F., Yao, B., Li, G., Zhou, L., Guo, M.: Simba: efficient in-memory spatial analytics. In: SIGMOD (2016)
Google Scholar
Yang, J., Widom, J.: Incremental computation and maintenance of temporal aggregates. In: ICDE (2001)
Google Scholar
Yang, Y., Chen, K.: Temporal data clustering via weighted clustering ensemble with different representations. TKDE (2011)
Google Scholar
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI (2012)
Google Scholar
Zhang, D., Markowetz, A., Tsotras, V.J., Gunopulos, D., Seeger, B.: On computing temporal aggregates with range predicates. TODS (2008)
Google Scholar
Zhang, S., Yang, Y., Fan, W., Lan, L., Yuan, M.: OceanRT: real-time analytics over large temporal data. In: SIGMOD (2014)
Google Scholar

Download references

Acknowledgments

This work was supported by the National Basic Research Program (973 Program, No. 2015CB352403), the NSFC (U1636210, 61729202, 91438121, 61672351, 61472453, U1401256, U1501252, U1611264, U1711261 and U1711262), the National Key Research and Development Program of China (2016YFB0700502), the Scientific Innovation Act of STCSM (15JC1402400), the Opening Projects of Guangdong Key Laboratory of Big Data Analysis and Processing (201808), Guangdong Province Key Laboratory of Popular High Performance Computers of Shenzhen University (SZU-GDPHPCL2017), and the Microsoft Research Asia.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
Bin Yao, Wei Zhang, Zhongpu Chen & Minyi Guo
Guangdong Province Key Laboratory of Big Data Analysis and Processing, Guangzhou, China
Bin Yao
Guangdong Province Key Laboratory of Popular High Performance Computers, Guangzhou, China
Wei Zhang
School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
Zhi-Jie Wang
Extreme Computing Research Center, King Abdullah University of Science and Technology, Mecca, Saudi Arabia
Shuo Shang
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
Kai Zheng

Authors

Bin Yao
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-Jie Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhongpu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shuo Shang
View author publications
You can also search for this author in PubMed Google Scholar
Kai Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Minyi Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Shuo Shang or Kai Zheng .

Editor information

Editors and Affiliations

Simon Fraser University, Burnaby, BC, Canada
Jian Pei
Aristotle University of Thessaloniki, Thessaloniki, Greece
Yannis Manolopoulos
University of Queensland, Brisbane, QLD, Australia
Shazia Sadiq
University of Western Australia, Crawley, WA, Australia
Jianxin Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yao, B. et al. (2018). Distributed In-Memory Analytics for Big Temporal Data. In: Pei, J., Manolopoulos, Y., Sadiq, S., Li, J. (eds) Database Systems for Advanced Applications. DASFAA 2018. Lecture Notes in Computer Science(), vol 10827. Springer, Cham. https://doi.org/10.1007/978-3-319-91452-7_36

Download citation

DOI: https://doi.org/10.1007/978-3-319-91452-7_36
Published: 13 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91451-0
Online ISBN: 978-3-319-91452-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics