Replicated Parallel I/O without Additional Scheduling Costs
- 488 Downloads
A common technique for improving performance in a database is to decluster the database among multiple disks so that data retrieval can be parallelized. In this paper we focus on answering range queries in a multidimensional database (such as a GIS), where each of its dimensions is divided uniformly to obtain tiles which are placed on different disks; there has been a significant amount of research for this problem (a subset of which is [1,2,3,4,5,6,7,8,9,11,12,13,14,15]). A declustering scheme would be optimal if any range query could be answered by doing no more than ⌈# of tiles inside the range/# of disks ⌉ retrievals from any one disk. However, it was shown in  that this is not achievable in many cases even for two dimensions, and therefore much of the research in this area has focused on developing schemes that performed close to optimal. Recently, the idea of using replication (i.e. placing records on more than one disk) to increase performance has been introduced [7,12,13,15]. If replication is used, a retrieval schedule (i.e. which disk to retrieve each tile from) must be computed whenever a query is being processed. In this paper we introduce a class of replicated schemes where the retrieval schedule can be computed in time O(# of tiles inside the query’s range), which is asymptotically equivalent to query retrieval for the non-replicated case. Furthermore, this class of schemes has a strong performance advantage over non-replicated schemes, and several schemes are introduced that are either optimal or are optimal plus a constant additive factor. Also presented in this paper is a strictly optimal scheme for any number of colors that requires the lowest known level of replication of any such scheme.
Unable to display preview. Download preview PDF.
- 1.Abdel-Ghaffar, K., Abbadi, A.E.: Optimal allocation of two-dimensional data. In: International Conference on Database Theory, pp. 409–418 (1997)Google Scholar
- 3.Bhatia, R., Sinha, R., Chen, C.-M.: Hierarchical declustering schemes for range queries. In: 7th Int’l Conf. on Extending Database Technology (2000)Google Scholar
- 4.Bhatia, R., Sinha, R.K., Chen, C.-M.: Declustering using golden ratio sequences. In: ICDE, pp. 271–280 (2000)Google Scholar
- 6.Du, H., Sobolewski, J.: Disk allocation for cartesian product files on multiple disk systems. ACM Transactions on Database System, 82–101 (1982)Google Scholar
- 8.Himatsingka, B., Srivastava, J., Li, J.-Z., Rotem, D.: Latin hypercubes: A class of multidimensional declustering techniques (1994)Google Scholar
- 11.Prabhakar, S., Abdel-Ghaffar, K., Agrawal, D., Abbadi, A.E.: Cyclic allocation of two-dimensional data. Technical Report TRCS97-08, 1 (1997)Google Scholar
- 12.Sanders, P.: Reconciling simplicity and realism in parallel disk models. In: Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, pp. 67–76. ACM Press, New York (2001)Google Scholar
- 13.Sanders, P., Egner, S., Korst, J.: Fast concurrent access to parallel disks. In: Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms, pp. 849–858. ACM Press, New York (2000)Google Scholar
- 14.Sinha, R.K., Bhatia, R., Chen, C.-M.: Asymptotically optimal declustering schemes for range queries. In: International Conference on Database Theory (2001)Google Scholar
- 15.Tosun, A., Ferhatosmanoglu, H.: Optimal parallel i/o using replication. Technical Report OSU-CISRC-11/01-TR26 (2001)Google Scholar