An On-demand Serialization Mechanism for Trees
In the Big Data era, complex data structures are usually too big to reside in main memory. Traditional serialization mechanism can only read a tree from the disk or write a tree to the disk as a whole. When the tree gets huge, memory consumption to hold the whole tree becomes the bottleneck. To solve this problem, one need to be able to read or write only part of the tree only when necessary. We propose an on-demand serialization mechanism that can read or write tree nodes one at a time while keep the logical structure intact. The mechanism is implemented in the GeDBIT (Generalized Distance-Based Index Tree) system in C++. Empirical results demonstrate the functionality and efficiency of our mechanism.
KeywordsSerialization I/O index memory data
Unable to display preview. Download preview PDF.
- 1.Stanley B.: Lippman Barbara, E.: Moo JoséeLaJoie.: C++ primerGoogle Scholar
- 2.Boost Library DocumentationGoogle Scholar
- 4.MoBIoS test suite, http://aug.csres.utexas.edu/mobiosworkload
- 7.SISAP test suite, http://sisap.org/Metric_Space_Library.html
- 8.Bozkaya, T., Ozsoyoglu, M.: Distance-based indexing for high-dimensional metric spaces. In: Peckman, J.M., Ram, S., Franklin, M. (eds.) Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 1997), May 13-15, pp. 357–368. ACM Press, New York (1997)CrossRefGoogle Scholar