Abstract
In the real word, temporal data can be found in many applications, and it is rapidly increasing nowadays. It is urgently important and challenging to manage and operate big temporal data efficiently and effectively, due to the large volume of big temporal data and the real-time response requirement. Processing big temporal data using a distributed system is a desired choice, since a single-machine based system usually has the limited computing ability. Nevertheless, existing distributed systems or methods either are disk-based solutions, or cannot support native queries, which may not well meet the demands of low latency and high throughput. To attack these issues, this article suggests a new approach to handle big temporal data. Our approach is an In-memory based Two-level Index Solution in Spark, dubbed as ITISS. The proposed framework of our solution is easily understood and implemented, but without loss of effectiveness and efficiency. Based on the proposed framework, this article develops targeted algorithms for handling time travel, temporal aggregation, and temporal join queries, respectively. We have implemented our framework in Apache Spark, extended the Apache Spark SQL to support declarative SQL interface that enables users to perform temporal queries with a few lines of SQL statements, and conducted extensive experiments to verify the performance of our solution. The experimental results, based on both real and synthetic datasets, consistently demonstrate that our proposed solution is efficient and competitive for processing big temporal data.
Similar content being viewed by others
References
Postgres 9.2 highlight - range types. http://paquier.xyz/postgresql-2/postgres-9-2-highlight-range-types, 2017
Temporal tables. https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables, 2017
Workspace manager valid time support. https://docs.oracle.com/cd/B2835901/appdev.111/b28396, 2017
Ahn I, Snodgrass RT (1986) Performance evaluation of a temporal database management system. In: SIGMOD, pp 96–107
Alarabi L, Mokbel MF (2017) A demonstration of st-hadoop A mapreduce framework for big spatio-temporal data. PVLDB 10(12):1961–1964
Alarabi L, Mokbel MF, Musleh M (2017) St-hadoop: a mapreduce framework for spatio-temporal data. In: SSTD, pp 84–104. Springer
Becker B, Gschwind S, Ohler T, Seeger B, Widmayer P (1996) An asymptotically optimal multiversion b-tree. VLDB J 5(4):264–275
Bettini C, Wang XS, Bertino E, Jajodia S (1995) Semantic assumptions and query evaluation in temporal databases. In: SIGMOD, pp 257–268
Bliujute R, Jensen CS, Saltenis S, Slivinskas G (1998) R-tree based indexing of now-relative bitemporal data, In: VLDB, pp 345–356
Böhlen MH, Gamper J, Jensen CS (2006) Multi-dimensional aggregation for temporal data. In: EDBT, pp 257–275
Cao X, Chen L, Cong G, Jensen CS, Qu Q, Skovsgaard A, Wu D, Yiu ML (2012) Spatial keyword querying. In: ER, pp 16–29
Chandramouli B, Goldstein J, Duan S (2012) Temporal analytics on big data for web advertising. In: ICDE, pp 90–101
Chen L, Cong G, Jensen CS, Wu D (2013) Spatial keyword query processing: An experimental evaluation. PVLDB, 6(3):217–228
Chen L, Shang S, Yao B, Zheng K (2018) Spatio-temporal top-k term search over sliding window. World Wide Web, pp 1–18
Cheng K (2017) On computing temporal aggregates over null time intervals. In: DEXA, pp 67–79
Elmasri R, Wuu GTJ, Kim Y-J (1990) The time index An access structure for temporal data. In: VLDB, pp 1–12
Färber F, May N, Lehner W, Große P, Müller I, Rauhe H, Dees J (2012) The SAP HANA database – an architecture overview. IEEE Data Eng Bull 35(1):28–33
Gao D, Jensen CS, Snodgrass RT, Soo MD (2005) Join operations in temporal databases. VLDB J 14(1):2–29
Gendrano JAG, Huang BC, Rodrigue JIMM, Moon B, Snodgrass RT (1999) Parallel algorithms for computing temporal aggregates. In: ICDE, pp 418–427
Gollapudi S, Sivakumar D (2004) Framework and algorithms for trend analysis in massive temporal data sets. In: CIKM, pp 168–177
Gunadhi H, Segev A (1991) Query processing algorithms for temporal intersection joins. In: ICDE, pp 336–344
Günnemann S, Kremer H, Laufkötter C, Seidl T (2012) Tracing evolving subspace clusters in temporal climate data. Data Min Knowl Discov 24(2):387–410
Gupta M, Gao J, Aggarwal CC, Han J (2014) Outlier detection for temporal data A survey. IEEE Trans Knowl Data Eng 26(9):2250–2267
Jensen CS, Snodgrass RT (1999) Temporal data management. IEEE Trans Knowl Data Eng 11(1):36–44
Kaufmann M, Fischer PM, May N, Ge C, Goel AK, Kossmann D (2015) Bi-temporal timeline index: A data structure for processing queries on bi-temporal data. In: ICDE, pp 471–482
Kaufmann M, Manjili AA, Vagenas P, Fischer PM, Kossmann D, Färber F, May N (2013) Timeline index: a unified data structure for processing queries on temporal data in SAP HANA. In: SIGMOD, pp 1173–1184
Kline N, Snodgrass RT (1995) Computing temporal aggregates. In: ICDE, pp 222–231
Kollios G, Tsotras VJ (2002) Hashing methods for temporal data. IEEE Trans Knowl Data Eng 14(4):902–919
Lakshminarasimhan HG (2014) Processing spatio-temporal data on map-reduce, pp 57–59. Springer
Le W, Li F, Tao Y, Christensen R (2013) Optimal splitters for temporal and multi-version databases. In: SIGMOD, pp 109–120
Leskovec J, Krevl A (2014) SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data
Leung TYC, Muntz RR (1992) Temporal query processing and optimization in multiprocessor database machines. In: VLDB, pp 383–394
Li F, Yi K, Le W (2010) Top-k queries on temporal data. VLDB J 19 (5):715–733
Li M, Chen L, Cong G, Gu Y, Yu G (2016) Efficient processing of location-aware group preference queries. In: CIKM, pp 559–568
Loglisci C, Ceci M, Malerba D (2011) A temporal data mining framework for analyzing longitudinal data. In: DEXA, pp 97–106
Lomet DB, Barga RS, Mokbel MF, Shegalov G, Wang R, Zhu Y (2006) Transaction time support inside a database engine. In: ICDE, pp 35
Lu H, Ooi BC, Tan K-L (1994) On spatially partitioned temporal join. In: VLDB, pp 546–557
Lu H, Yang B, Jensen CS (2011) Spatio-temporal joins on symbolic indoor tracking data. In: ICDE, pp 816–827
Muth P, O’Neil P, Pick A, Weikum G (2000) The LHAM log-structured history data access method. VLDB J 8(3-4):199–221
Özsoyoglu G, Snodgrass RT (1995) Temporal and real-time databases: A survey. IEEE Trans Knowl Data Eng 7(4):513–532
Ramaswamy S (1997) Efficient indexing for constraint and temporal databases. In: ICDT, pp 419–431
Roddick JF, Spiliopoulou M (2002) A survey of temporal knowledge discovery paradigms and methods. IEEE Trans Knowl Data Eng 14(4):750–767
Saracco CM (2012) A matter of time: temporal data management in db2 10. Technical report, IBM
Segev A, Gunadhi H (1989) Event-join optimization in temporal relational databases. In: VLDB, pp 205–215
Shang S, Chen L, Jensen CS, Wen J-R, Kalnis P (2017) Searching trajectories by regions of interest. IEEE Trans Knowl Data Eng 29(7):1549–1562
Shang S, Chen L, Wei Z, Jensen CS, Wen J-R, Kalnis P (2016) Collective travel planning in spatial networks. IEEE Trans Knowl Data Eng 28(5):1132–1146
Shang S, Chen L, Wei Z, Jensen CS, Zheng K, Kalnis P (2017) Trajectory similarity join in spatial networks. PVLDB 10(11):1178–1189
Shang S, Chen L, Wei Z, Jensen CS, Zheng K, Kalnis P (2018) Parallel trajectory similarity joins in spatial networks. VLDB J 27(3):395–420
Shang S, Chen L, Zheng K, Jensen CS, Wei Z, Kalnis P (2019) Parallel trajectory to location join. IEEE Trans Knowl Data Eng, pp 1–14. online first
Shang S, Ding R, Bo Y, Xie K, Zheng K, Kalnis P (2012) User oriented trajectory search for trip recommendation. In: EDBT, pp 156–167
Shang S, Ding R, Zheng K, Jensen CS, Kalnis P, Zhou X (2014) Personalized trajectory matching in spatial networks. VLDB J 23(3):449–468
Shang S, Liu J, Zheng K, Lu H, Pedersen TB, Wen J-R (2015) Planning unobstructed paths in traffic-aware spatial networks. GeoInformatica 19(4):723–746
Shang S, Zheng K, Jensen CS, Yang B, Kalnis P, Li G, Wen J-R (2015) Discovery of path nearby clusters in spatial networks. IEEE Trans Knowl Data Eng 27(6):1505–1518
Son D, Elmasri R (1996) Efficient temporal join processing using time index. In: SSDBM, pp 252–261
Wang P, Zhang P, Zhou C, Li Z, Yang H (2017) Hierarchical evolving dirichlet processes for modeling nonlinear evolutionary traces in temporal data. Data Min Knowl Discov 31(1):32–64
Wang XS, Jajodia S, Subrahmanian VS (1993) Temporal modules: an approach toward federated temporal databases. In: SIGMOD, pp 227–236
Whitman RT, Park MB, Marsh BG, Hoel EG (2017) Spatio-temporal join on apache spark. In: SIGSPATIAL, pages 1–10. ACM
Xie D, Li F, Yao B, Li G, Zhou L, Guo M (2016) Simba: efficient in-memory spatial analytics. In: SIGMOD, pp 1071–1085
Xu Y, Chen L, Yao B, Shang S, Zhu S, Zheng K, Li F (2017) Location-based top-k term querying over sliding window. In: WISE, pp 299–314. Springer
Yang J, Widom J (2001) Incremental computation and maintenance of temporal aggregates. In: ICDE, pp 51–60
Yang Y, Chen K (2011) Temporal data clustering via weighted clustering ensemble with different representations. IEEE Trans Knowl Data Eng 23(2):307–320
Yao B, Zhang W, Wang Z-J, Chen Z, Shang S, Zheng K, Guo M (2018) Distributed in-memory analytics for big temporal data. In: DASFAA, pp 549–565
Ye Y, Wang G, Chen L, Wang H (2013) Efficient keyword search on uncertain graph data. IEEE Trans Knowl Data Eng 25(12):2767–2779
Ye Y, Wang G, Chen L, Wang H (2015) Graph similarity search on large uncertain graph databases. VLDB J 24(2):271–296
Ye Y, Wang G, Xu JY, Chen L (2015) Efficient distributed subgraph similarity matching. VLDB J 24(3):369–394
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauly M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: USENIX, pp 15–28
Zhang D, Markowetz A, Tsotras VJ, Gunopulos D, Seeger B (2008) On computing temporal aggregates with range predicates, vol 33
Zhang D, Tsotras VJ, Seeger B (2002) Efficient temporal join processing using indices. In: ICDE, pp 103–113
Zhang S, Yang Y, Fan W, Lan L, Yuan M (2014) Oceanrt: real-time analytics over large temporal data. In: SIGMOD, pp 1099–1102
Zhao K, Chen L, Cong G (2016) Topic exploration in spatio-temporal document collections. In: SIGMOD, pp 985–998
Zhao K, Liu Y, Yuan Q, Chen L, Chen Z, Cong G (2016) Towards personalized maps: mining user preferences from geo-textual data. PVLDB, 9 (13):1545–1548
Acknowledgements
This work was supported by the NSFC (61872235, 61729202, 61832017, U1636210, U61811264, 61832013 and 61672351), and the National Key Research and Development Program of China (2018YFC1504504, 2016YFB0700502 and 2018YFB1004400).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, Z., Yao, B., Wang, ZJ. et al. ITISS: an efficient framework for querying big temporal data. Geoinformatica 24, 27–59 (2020). https://doi.org/10.1007/s10707-019-00362-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-019-00362-1