Abstract
A common technique for improving performance for database query retrieval is to decluster the database among multiple disks so that retrievals can be parallelized. In this paper we focus on answering range queries over a multidimensional database, where each of its dimensions are divided uniformly to obtain tiles which are placed on different disks; there has been a significant amount of research for determining how to place the records on disks to minimize the retrieval time. Recently, the idea of using replication (i.e., placing records on more than one disk) to improve performance has been introduced. When using replication there are two goals: i) to minimize the retrieval time and ii) to minimize the scheduling overhead it takes to determine which disk obtains a specific record when processing a query. The previously known replicated declustering schemes with low retrieval times are randomized; and one of the primary advantages of randomized schemes is that they balance the load evenly among the disks for large queries with high probability. In this paper we introduce a new class of replicated placement schemes called the shift schemes that are: i) deterministic, ii) have retrieval performance that is comparable to the randomized schemes, iii) have a strictly optimal retrieval time for all large queries, and iv) have a more efficient query scheduling algorithm than those for the randomized placements. Furthermore, we display experimental results that suggest that the shift schemes have stronger average performance (in terms of retrieval times) than the randomized schemes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Czumaj, C.R.A., Scheideler, C.: Perfectly Balanced Allocation. In: Arora, S., Jansen, K., Rolim, J.D.P., Sahai, A. (eds.) RANDOM 2003 and APPROX 2003. LNCS, vol. 2764, pp. 240–251. Springer, Heidelberg (2003)
Abdel-Ghaffar, K., Abbadi, A.E.: Optimal Allocation of Two-dimensional Data. In: International Conference on Database Theory, pp. 409–418 (1997)
Aerts, J., Korst, J., Egner, S.: Random Duplicate Storage for Load Balancing in Multimedia Servers. Information Processing Letters 76(1–2), 51–59 (2000)
Atallah, M., Frikken, K.: Replicated Parallel I/O without Additional Scheduling Costs. In: MaÅ™Ãk, V., Å tÄ›pánková, O., Retschitzegger, W. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 223–232. Springer, Heidelberg (2003)
Atallah, M.J., Prabhakar, S.: (Almost) Optimal Parallel Block Access to Range Queries. In: Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 205–215. ACM Press, New York (2000)
Bhatia, R., Sinha, R., Chen, C.-M.: Hierarchical Declustering Schemes for Range Queries. In: 7th Int’l Conf. on Extending Database Technology (2000)
Bhatia, R., Sinha, R.K., Chen, C.-M.: Declustering using Golden Ratio Sequences. In: ICDE, pp. 271–280 (2000)
Chen, C.-M., Cheng, C.T.: From Discrepancy to Declustering: Near-optimal Multidimensional Declustering Strategies for Range Queries. In: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 29–38. ACM Press, New York (2002)
Chen, C.-M., Cheng, C.T.: Replication and Retrieval Strategies of Multidimensional Data on Parallel Disks. In: Proceedings of the twelfth international conference on Information and knowledge management, pp. 32–39. ACM Press, New York (2003)
Chen, L.T., Rotem, D.: Optimal Response Time Retrieval of Replicated Data (extended abstract). In: Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, pp. 36–44. ACM Press, New York (1994)
Du, H., Sobolewski, J.: Disk Allocation for Cartesian Product Files on Multiple Disk Systems. ACM Transactions on Database System, 82–101 (1982)
Frikken, K., Atallah, M., Prabhakar, S., Safavi-Naini, R.: Optimal Parallel I/O for Range Queries through Replication. In: Hameurlain, A., Cicchetti, R., Traunmüller, R. (eds.) DEXA 2002. LNCS, vol. 2453, pp. 669–678. Springer, Heidelberg (2002)
Himatsingka, B., Srivastava, J., Li, J.-Z., Rotem, D.: Latin Hypercubes: A Class of Multidimensional Declustering Techniques (1994)
Hsiao, H.-I., DeWitt, D.: A new Availability Strategy for Multiprocessor Database Machines. In: Proceedings of Data Engineering, pp. 456–465 (1990)
Kim, M.H., Pramanik, S.: Optimal File Distribution for Partial Match Retrieval. In: Proceedings of the 1988 ACM SIGMOD international conference on Management of data, pp. 173–182. ACM Press, New York (1988)
Matousek, J.: Geometric discrepancy, an illustrated guide. Springer, Heidelberg (1999)
Prabhakar, S., Abdel-Ghaffar, K., Agrawal, D., Abbadi, A.E.: Cyclic Allocation of Two-Dimensional Data. In: 14th International Conference on Data Engineering, pp. 94–101 (1998)
Sanders, P.: Reconciling Simplicity and Realism in Parallel Disk Models. In: Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, pp. 67–76. ACM Press, New York (2001)
Sanders, P., Egner, S., Korst, J.: Fast Concurrent Access to Parallel Disks. In: Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms, pp. 849–858. ACM Press, New York (2000)
Sinha, R.K., Bhatia, R., Chen, C.-M.: Asymptotically Optimal Declustering Schemes for Range Queries. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, p. 144. Springer, Heidelberg (2000)
Tosun, A., Ferhatosmanoglu, H.: Optimal Parallel I/O using Replication. Technical Report OSU-CISRC-11/01-TR26 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Frikken, K.B. (2004). Optimal Distributed Declustering Using Replication. In: Eiter, T., Libkin, L. (eds) Database Theory - ICDT 2005. ICDT 2005. Lecture Notes in Computer Science, vol 3363. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30570-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-30570-5_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24288-8
Online ISBN: 978-3-540-30570-5
eBook Packages: Computer ScienceComputer Science (R0)