Advertisement

Replicated Parallel I/O without Additional Scheduling Costs

  • Mikhail Atallah
  • Keith Frikken
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2736)

Abstract

A common technique for improving performance in a database is to decluster the database among multiple disks so that data retrieval can be parallelized. In this paper we focus on answering range queries in a multidimensional database (such as a GIS), where each of its dimensions is divided uniformly to obtain tiles which are placed on different disks; there has been a significant amount of research for this problem (a subset of which is [1,2,3,4,5,6,7,8,9,11,12,13,14,15]). A declustering scheme would be optimal if any range query could be answered by doing no more than ⌈# of tiles inside the range/# of disks ⌉ retrievals from any one disk. However, it was shown in [1] that this is not achievable in many cases even for two dimensions, and therefore much of the research in this area has focused on developing schemes that performed close to optimal. Recently, the idea of using replication (i.e. placing records on more than one disk) to increase performance has been introduced [7,12,13,15]. If replication is used, a retrieval schedule (i.e. which disk to retrieve each tile from) must be computed whenever a query is being processed. In this paper we introduce a class of replicated schemes where the retrieval schedule can be computed in time O(# of tiles inside the query’s range), which is asymptotically equivalent to query retrieval for the non-replicated case. Furthermore, this class of schemes has a strong performance advantage over non-replicated schemes, and several schemes are introduced that are either optimal or are optimal plus a constant additive factor. Also presented in this paper is a strictly optimal scheme for any number of colors that requires the lowest known level of replication of any such scheme.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abdel-Ghaffar, K., Abbadi, A.E.: Optimal allocation of two-dimensional data. In: International Conference on Database Theory, pp. 409–418 (1997)Google Scholar
  2. 2.
    Atallah, M.J., Prabhakar, S. (almost) Optimal parallel block access to range queries. In: Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 205–215. ACM Press, New York (2000)CrossRefGoogle Scholar
  3. 3.
    Bhatia, R., Sinha, R., Chen, C.-M.: Hierarchical declustering schemes for range queries. In: 7th Int’l Conf. on Extending Database Technology (2000)Google Scholar
  4. 4.
    Bhatia, R., Sinha, R.K., Chen, C.-M.: Declustering using golden ratio sequences. In: ICDE, pp. 271–280 (2000)Google Scholar
  5. 5.
    Chen, C.-M., Cheng, C.T.: From discrepancy to declustering: near-optimal multidimensional declustering strategies for range queries. In: Proceedings of the twentyfirst ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 29–38. ACM Press, New York (2002)CrossRefGoogle Scholar
  6. 6.
    Du, H., Sobolewski, J.: Disk allocation for cartesian product files on multiple disk systems. ACM Transactions on Database System, 82–101 (1982)Google Scholar
  7. 7.
    Frikken, K., Atallah, M., Prabhakar, S., Safavi-Naini, R.: Optimal parallel i/o for range queries through replication. In: Hameurlain, A., Cicchetti, R., Traunmüller, R. (eds.) DEXA 2002. LNCS, vol. 2453, pp. 669–678. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  8. 8.
    Himatsingka, B., Srivastava, J., Li, J.-Z., Rotem, D.: Latin hypercubes: A class of multidimensional declustering techniques (1994)Google Scholar
  9. 9.
    Kim, M.H., Pramanik, S.: Optimal file distribution for partial match retrieval. In: Proceedings of the 1988 ACM SIGMOD international conference on Management of data, pp. 173–182. ACM Press, New York (1988)CrossRefGoogle Scholar
  10. 10.
    Matousek, J.: Geometric discrepancy, an illustrated guide. Springer, Heidelberg (1999)zbMATHGoogle Scholar
  11. 11.
    Prabhakar, S., Abdel-Ghaffar, K., Agrawal, D., Abbadi, A.E.: Cyclic allocation of two-dimensional data. Technical Report TRCS97-08, 1 (1997)Google Scholar
  12. 12.
    Sanders, P.: Reconciling simplicity and realism in parallel disk models. In: Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, pp. 67–76. ACM Press, New York (2001)Google Scholar
  13. 13.
    Sanders, P., Egner, S., Korst, J.: Fast concurrent access to parallel disks. In: Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms, pp. 849–858. ACM Press, New York (2000)Google Scholar
  14. 14.
    Sinha, R.K., Bhatia, R., Chen, C.-M.: Asymptotically optimal declustering schemes for range queries. In: International Conference on Database Theory (2001)Google Scholar
  15. 15.
    Tosun, A., Ferhatosmanoglu, H.: Optimal parallel i/o using replication. Technical Report OSU-CISRC-11/01-TR26 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Mikhail Atallah
    • 1
  • Keith Frikken
    • 1
  1. 1.Purdue University 

Personalised recommendations