Skip to main content

Generating High Dimensional Data and Query Sets

  • Conference paper
SOFSEM 2007: Theory and Practice of Computer Science (SOFSEM 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4362))

  • 1670 Accesses

Abstract

Previous researches on multidimensional indexes typically have used synthetic data sets distributed uniformly or normally over multidimensional space for performance evaluation. These kinds of data sets hardly reflect the characteristics of multimedia database applications. In this paper, we discuss issues on generating high dimensional data and query sets for resolving the problem. We first identify the requirements of the data and query sets for fair performance evaluation of multidimensional indexes, and then propose HDDQ_Gen (High-Dimensional Data and Query Generator) that satisfies such requirements. HDDQ_Gen has the following features: (1) clustered distribution, (2) various object distribution in each cluster, (3) various cluster distribution, (4) various correlations among different dimensions, and (5) query distribution depending on data distribution. Using these features, users are able to control the distribution characteristics of data and query sets appropriate for their target applications.

This research was supported by the MIC, Korea, under the ITRC support program supervised by the IITA (IITA-2005-C1090-0502-0009).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bohm, C., Berchtold, S., Keim, D.: Searching in High-Dimensional Spaces-index Structures for Improving the Performance of Multimedia Databases. ACM Computing Surveys 33, 322–373 (2001)

    Article  Google Scholar 

  2. Ogras, U., Ferhatosmanoglu, H.: Dimensionality Reduction Using Magnitude and Shape Approximations. In: Proc. of the 12th Int’l. Conf. on Information and Knowledge Management, pp. 99–107 (2003)

    Google Scholar 

  3. Jeong, S., Kim, S.-W., Kim, K., Choi, B.-U.: An effective method for approximating the euclidean distance in high-dimensional space. In: Bressan, S., Küng, J., Wagner, R. (eds.) DEXA 2006. LNCS, vol. 4080, pp. 863–872. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  4. Arya, M., et al.: QBISM: Extending a DBMS to Support 3D Medical Images. In: Proc. Int’l. Conf. on Data Engineering, pp. 314–325. IEEE Computer Society Press, Los Alamitos (1994)

    Google Scholar 

  5. Berchtold, S., et al.: Fast Nearest Neighbor Search in High-Dimensional Space. In: Proc. Int’l. Conf. on Data Engineering, pp. 209–218. IEEE Computer Society Press, Los Alamitos (1998)

    Google Scholar 

  6. Weber, R., Schek, H., Blott, S.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In: Proc. Int’l. Conf. on Very Large Data Bases, pp. 194–205 (1998)

    Google Scholar 

  7. Berchtold, S., Keim, D., Kriegel, H.: The X-tree: An Index Structure for High-Dimensional Data. In: Proc Int’l. Conf. on Very Large Data Bases, pp. 28–39 (1996)

    Google Scholar 

  8. Zobel, J., Moffat, A., Ramamohanarao, K.: Guidelines for Presentation and Comparison of Indexing Techniques. ACM SIGMOD Record 25, 10–15 (1996)

    Article  Google Scholar 

  9. Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is nearest neighbor meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  10. Kim, S., Aggarwal, C., Yu, P.: Effective Nearest Neighbor Indexing with the Euclidean Metric. In: Proc. ACM Int’l. Conf. on Information and Knowledge Management, pp. 9–16 (2001)

    Google Scholar 

  11. Jolliffe, I.: Principal Component Analysis. Springer, Heidelberg (1986)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Jan van Leeuwen Giuseppe F. Italiano Wiebe van der Hoek Christoph Meinel Harald Sack František Plášil

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Kim, SW., Yoon, SH., Lee, SC., Lee, J., Shin, M. (2007). Generating High Dimensional Data and Query Sets. In: van Leeuwen, J., Italiano, G.F., van der Hoek, W., Meinel, C., Sack, H., Plášil, F. (eds) SOFSEM 2007: Theory and Practice of Computer Science. SOFSEM 2007. Lecture Notes in Computer Science, vol 4362. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69507-3_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69507-3_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69506-6

  • Online ISBN: 978-3-540-69507-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics