Efficient time-interval data extraction in MVCC-based RDBMS

  • Haixiang Li
  • Zhanhao Zhao
  • Yijian Cheng
  • Wei Lu
  • Xiaoyong Du
  • Anqun Pan
Article
  • 11 Downloads
Part of the following topical collections:
  1. Special Issue on Web and Big Data

Abstract

Account reconciliation is the core business in banks and game companies. It regularly examines the account balance with the bank or expense statement for every user and reports the daily, weekly, or monthly balance. Once an account imbalance occurs, it is necessary to efficiently trace the transactions that possibly destroy the account balances. To help efficiently trace this kind of transactions, in this paper, we investigate the problem of doing efficient time-interval data extraction in MVCC-based RDBMS, i.e., extracting the incremental data that are valid between a given time interval in MVCC-based RDBMS. To this end, we propose a snapshot-based method to extract incremental data based on the fact that each record is inherently associated with lifetime, indicating whether the record can be accessed or not for a given time interval. We elaborate how to integrate our method into MySQL, an open-sourced RDBMS, and propose a declarative way to fetch the incremental data. Several optimization techniques are proposed to boost the extraction performance. Extensive experiments are conducted over the standardized Sysbench benchmark to show that our proposed method is robust and efficient.

Keywords

RDBMS MVCC Incremental data extraction Snapshot 

Notes

Acknowledgments

We would like to thank the anonymous reviewers for their valuable comments. This work was supported by the National Natural Science Foundation of China (61502504, 61732014) and the Tencent Research Grant for Renmin University of China.

References

  1. 1.
    Bernstein, P.A., Goodman, N.: Concurrency control in distributed database systems. ACM Comput. Surv. (CSUR) 13(2), 185–221 (1981)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Cahill, M.J., Röhm, U., Fekete, A.D.: Serializable isolation for snapshot databases. ACM Trans. Datab. Syst. (TODS) 34(4), 20 (2009)Google Scholar
  3. 3.
  4. 4.
    Doan, A., Naughton, J. F., Ramakrishnan, R., Baid, A., Chai, X., Chen, F., Chen, T., Chu, E., DeRose, P., Gao, B., et al.: Information extraction challenges in managing unstructured data. ACM SIGMOD Rec. 37(4), 14–20 (2009)CrossRefGoogle Scholar
  5. 5.
    Labio, W., Garcia-Molina, H.: Efficient Snapshot Differential Algorithms in Data Warehousing. Tech. rep., Stanford InfoLab (1996)Google Scholar
  6. 6.
    Li, H., Feng, Y., Fan, P.: The art of Database Transaction Processiong: Transaction Management and Concurrency Control. China Machine Press (2017)Google Scholar
  7. 7.
    Lu, W., Fung, G.P.C., Du, X., Zhou, X., Chen, L., Deng, K.: Approximate entity extraction in temporal databases. World Wide Web 14(2), 157–186 (2011)CrossRefGoogle Scholar
  8. 8.
    Lu, W., Hou, J., Yan, Y., Zhang, M., Du, X., Moscibroda, T.: MSQL: efficient similarity search in metric spaces using SQL. VLDB J. 26(6), 829–854 (2017)CrossRefGoogle Scholar
  9. 9.
    Ma, K., Yang, B.: Log-based change data capture from schema-free document stores using mapreduce. In: 2015 International Conference on Cloud Technologies and Applications (CloudTech), pp. 1–6 (2015).Google Scholar
  10. 10.
    McWherter, D.T., Schroeder, B., Ailamaki, A., Harchol-Balter, M.: Priority mechanisms for OLTP and transactional Web applications. In: ICDE. IEEE Computer Society, pp. 535–546 (2004)Google Scholar
  11. 11.
    Meehan, J., Tatbul, N., Zdonik, S., Aslantas, C., Cetintemel, U., Du, J., Kraska, T., Madden, S., Maier, D., Pavlo, A., Stonebraker, M., Tufte, K., Wang, H.: S-store: Streaming meets transaction processing. Proc. VLDB Endow. 8(13), 2134–2145 (2015)CrossRefGoogle Scholar
  12. 12.
    Melnik, S., Gubarev, A., Long, J.J., Romer, G., Shivakumar, S., Tolton, M., Vassilakis, T.: Dremel: Interactive analysis of Web-scale datasets. Proc. VLDB Endow. 3(1-2), 330–339 (2010)CrossRefGoogle Scholar
  13. 13.
    Ports, D.R.K., Grittner, K.: Serializable snapshot isolation in postgresql. Proc. VLDB Endow. 5, 1850–1861 (2012)CrossRefGoogle Scholar
  14. 14.
  15. 15.
    Ram, P., Do, L.: Extracting delta for incremental data warehouse maintenance. In: Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073), pp. 220–229 (2000).Google Scholar
  16. 16.
    Reed, D. P.: Naming and Synchronization in a Decentralized Computer System. Ph.D. thesis Massachusetts Institute of Technology (1978)Google Scholar
  17. 17.
    Revilak, S., O’Neil, P., O’Neil, E.: Precisely serializable snapshot isolation (pssi). In: 2011 IEEE 27th International Conference on Data Engineering, pp. 482–493 (2011)Google Scholar
  18. 18.
    Stonebraker, M.: The design of the postgres storage system. In: Proceedings of the 13th International Conference on Very Large Data Bases, VLDB ’87, pp 289–300. Morgan Kaufmann Publishers Inc., San Francisco (1987)Google Scholar
  19. 19.
    Stonebraker, M., Rowe, L.A., Hirohama, M.: The implementation of postgres. IEEE Trans. Knowl. Data Eng. 2(1), 125–142 (1990)CrossRefGoogle Scholar
  20. 20.
  21. 21.
    Tencent Distributed SQL System (TDSQL). http://tdsql.org
  22. 22.
  23. 23.
    Wu, S, Ren, W, Yu, C, Chen, G, Zhang, D, Zhu, J: Personal recommendation using deep recurrent neural networks in NetEase. In: 32nd IEEE International Conference on Data Engineering, ICDE 2016, Helsinki, Finland, May 16-20, 2016. pp. 1218–1229 (2016)Google Scholar
  24. 24.
    Yabandeh, M., Gómez Ferro, D.: A critique of snapshot isolation. In: Proceedings of the 7th ACM European Conference on Computer Systems, pp. 155–168. ACM (2012)Google Scholar
  25. 25.
    Zhang, C., Sterck, H.D.: Supporting multi-row distributed transactions with global snapshot isolation using bare-bones hbase. In: 2010 11th IEEE/ACM International Conference on Grid Computing, pp. 177–184 (2010)Google Scholar
  26. 26.
    Zhang, D., Li, Y., Cao, X., Shao, J., Shen, H.T.: Augmented keyword search on spatial entity databases. VLDB J.  https://doi.org/10.1007/s00778-018-0497-6 (2018)

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Haixiang Li
    • 1
  • Zhanhao Zhao
    • 2
    • 3
  • Yijian Cheng
    • 2
    • 3
  • Wei Lu
    • 2
    • 3
  • Xiaoyong Du
    • 2
    • 3
  • Anqun Pan
    • 1
  1. 1.Tencent Inc.ShenzhenChina
  2. 2.Key Laboratory of Data Engineering and Knowledge EngineeringRenmin University of ChinaBeijingChina
  3. 3.School of InformationRenmin University of ChinaBeijingChina

Personalised recommendations