Skip to main content

Distributed In-Memory Analytics for Big Temporal Data

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10827))

Included in the following conference series:

Abstract

The temporal data is ubiquitous, and massive amount of temporal data is generated nowadays. Management of big temporal data is important yet challenging. Processing big temporal data using a distributed system is a desired choice. However, existing distributed systems/methods either cannot support native queries, or are disk-based solutions, which could not well satisfy the requirements of high throughput and low latency. To alleviate this issue, this paper proposes an In-memory based Two-level Index Solution in Spark (ITISS) for processing big temporal data. The framework of our system is easy to understand and implement, but without loss of efficiency. We conduct extensive experiments to verify the performance of our solution. Experimental results based on both real and synthetic datasets consistently demonstrate that our solution is efficient and competitive.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Postgres 9.2 highlight - range types. http://paquier.xyz/postgresql-2/postgres-9-2-highlight-range-types

  2. Temporal Tables. https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables

  3. Workspace Manager Valid Time Support. https://docs.oracle.com/cd/B28359_01/appdev.111/b28396/long_vt.htm#g1014747

  4. Ahn, I., Snodgrass, R.: Performance evaluation of a temporal database management system. In: SIGMOD (1986)

    Article  Google Scholar 

  5. Becker, B., Gschwind, S., Ohler, T., Seeger, B., Widmayer, B.: An asymptotically optimal multiversion B-tree. VLDBJ (1996)

    Google Scholar 

  6. Bettini, C., Wang, X.S., Bertino, E., Jajodia, S.: Semantic assumptions and query evaluation in temporal databases. In: SIGMOD (1995)

    Google Scholar 

  7. Bliujute, R., Jensen, C.S., Saltenis, S., Slivinskas, G.: R-tree based indexing of now-relative bitemporal data. In: VLDB (1998)

    Google Scholar 

  8. Böhlen, M., Gamper, J., Jensen, C.S.: Multi-dimensional aggregation for temporal data. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Boehm, K., Kemper, A., Grust, T., Boehm, C. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 257–275. Springer, Heidelberg (2006). https://doi.org/10.1007/11687238_18

    Chapter  Google Scholar 

  9. Chandramouli, B., Goldstein, J., Duan, S.: Temporal analytics on big data for web advertising. In: ICDE (2012)

    Google Scholar 

  10. Cheng, K.: On computing temporal aggregates over null time intervals. In: Benslimane, D., Damiani, E., Grosky, W.I., Hameurlain, A., Sheth, A., Wagner, R.R. (eds.) DEXA 2017. LNCS, vol. 10439, pp. 67–79. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64471-4_7

    Chapter  Google Scholar 

  11. Elmasri, R., Wuu, G.T., Kim, Y.J.: The time index: an access structure for temporal data. In: VLDB (1990)

    Google Scholar 

  12. Färber, F., et al.: The SAP HANA database-an architecture overview. IEEE Data Eng. Bull. (2012)

    Google Scholar 

  13. Gao, D., Jensen, S., Snodgrass, R.T., Soo, D.: Join operations in temporal databases. VLDBJ (2005)

    Google Scholar 

  14. Gendrano, J.A.G., Huang, B.C., Rodrigue, J.M., Moon, B., Snodgrass, R.T., Parallel algorithms for computing temporal aggregates. In: ICDE (1999)

    Google Scholar 

  15. Gollapudi, S., Sivakumar, D.: Framework and algorithms for trend analysis in massive temporal data sets. In: CIKM (2004)

    Google Scholar 

  16. Günnemann, S., Kremer, H., Laufkötter, C., Seidl, T.: Tracing evolving subspace clusters in temporal climate data. DMKD 24, 387–410 (2012)

    MathSciNet  Google Scholar 

  17. Gupta, M., Gao, J., Aggarwal, C.C., Han, J.: Outlier detection for temporal data: a survey. TKDE (2014)

    Google Scholar 

  18. Jensen, C.S., Snodgrass, R.T.: Temporal data management. TKDE (1999)

    Google Scholar 

  19. Kaufmann, M., Fischer, P.M., May, N., Ge, C., Goel, A.K., Kossmann, D.: Bi-temporal timeline index: a data structure for processing queries on bi-temporal data. In: ICDE (2015)

    Google Scholar 

  20. Kaufmann, M., Manjili, A.A., Vagenas, P., Fischer, P.M., Kossmann, D., Färber, F., May, N.: Timeline index: a unified data structure for processing queries on temporal data in SAP HANA. In: SIGMOD (2013)

    Google Scholar 

  21. Kline, N., Snodgrass, R.T.: Computing temporal aggregates. In: ICDE (1995)

    Google Scholar 

  22. Kollios, G., Tsotras, V.J.: Hashing methods for temporal data. TKDE (2002)

    Google Scholar 

  23. Le, W., Li, F., Tao, Y., Christensen, R.: Optimal splitters for temporal and multi-version databases. In: SIGMOD (2013)

    Google Scholar 

  24. Leskovec, J., Krevl, A.: SNAP datasets: stanford large network dataset collection (2014). http://snap.stanford.edu/data

  25. Leung, T.C., Muntz, R.R.: Temporal query processing and optimization in multiprocessor database machines. In: VLDB (1992)

    Google Scholar 

  26. Li, F., Yi, K., Le, W.: Top-k queries on temporal data. VLDBJ (2010)

    Google Scholar 

  27. Loglisci, C., Ceci, M., Malerba, D.: A temporal data mining framework for analyzing longitudinal data. In: Hameurlain, A., Liddle, S.W., Schewe, K.-D., Zhou, X. (eds.) DEXA 2011. LNCS, vol. 6861, pp. 97–106. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23091-2_9

    Chapter  Google Scholar 

  28. Lomet, D., et al.: Transaction time support inside a database engine. In: ICDE (2006)

    Google Scholar 

  29. Ramaswamy, S.: Efficient indexing for constraint and temporal databases. In: Afrati, F., Kolaitis, P. (eds.) ICDT 1997. LNCS, vol. 1186, pp. 419–431. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-62222-5_61

    Chapter  Google Scholar 

  30. Roddick, J.F., Spiliopoulou, M.: A survey of temporal knowledge discovery paradigms and methods. TKDE (2002)

    Google Scholar 

  31. Saracco, C.M., et al.: A matter of time: temporal data management in DB2 10. Technical report, IBM (2012)

    Google Scholar 

  32. Wang, P., Zhang, P., Zhou, C., Li, Z., Yang, H.: Hierarchical evolving Dirichlet processes for modeling nonlinear evolutionary traces in temporal data. DMKD 31, 32–64 (2017)

    MathSciNet  MATH  Google Scholar 

  33. Wang, X.S., Jajodia, S., Subrahmanian, V.: Temporal modules: an approach toward federated temporal databases. In: SIGMOD (1993)

    Google Scholar 

  34. Xie, D., Li, F., Yao, B., Li, G., Zhou, L., Guo, M.: Simba: efficient in-memory spatial analytics. In: SIGMOD (2016)

    Google Scholar 

  35. Yang, J., Widom, J.: Incremental computation and maintenance of temporal aggregates. In: ICDE (2001)

    Google Scholar 

  36. Yang, Y., Chen, K.: Temporal data clustering via weighted clustering ensemble with different representations. TKDE (2011)

    Google Scholar 

  37. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI (2012)

    Google Scholar 

  38. Zhang, D., Markowetz, A., Tsotras, V.J., Gunopulos, D., Seeger, B.: On computing temporal aggregates with range predicates. TODS (2008)

    Google Scholar 

  39. Zhang, S., Yang, Y., Fan, W., Lan, L., Yuan, M.: OceanRT: real-time analytics over large temporal data. In: SIGMOD (2014)

    Google Scholar 

Download references

Acknowledgments

This work was supported by the National Basic Research Program (973 Program, No. 2015CB352403), the NSFC (U1636210, 61729202, 91438121, 61672351, 61472453, U1401256, U1501252, U1611264, U1711261 and U1711262), the National Key Research and Development Program of China (2016YFB0700502), the Scientific Innovation Act of STCSM (15JC1402400), the Opening Projects of Guangdong Key Laboratory of Big Data Analysis and Processing (201808), Guangdong Province Key Laboratory of Popular High Performance Computers of Shenzhen University (SZU-GDPHPCL2017), and the Microsoft Research Asia.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Shuo Shang or Kai Zheng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yao, B. et al. (2018). Distributed In-Memory Analytics for Big Temporal Data. In: Pei, J., Manolopoulos, Y., Sadiq, S., Li, J. (eds) Database Systems for Advanced Applications. DASFAA 2018. Lecture Notes in Computer Science(), vol 10827. Springer, Cham. https://doi.org/10.1007/978-3-319-91452-7_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91452-7_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91451-0

  • Online ISBN: 978-3-319-91452-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics