Generating High Dimensional Data and Query Sets

Kim, Sang-Wook; Yoon, Seok-Ho; Lee, Sang-Cheol; Lee, Junghoon; Shin, Miyoung

doi:10.1007/978-3-540-69507-3_30

Sang-Wook Kim¹,
Seok-Ho Yoon¹,
Sang-Cheol Lee¹,
Junghoon Lee² &
…
Miyoung Shin³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4362))

Included in the following conference series:

International Conference on Current Trends in Theory and Practice of Computer Science

1670 Accesses

Abstract

Previous researches on multidimensional indexes typically have used synthetic data sets distributed uniformly or normally over multidimensional space for performance evaluation. These kinds of data sets hardly reflect the characteristics of multimedia database applications. In this paper, we discuss issues on generating high dimensional data and query sets for resolving the problem. We first identify the requirements of the data and query sets for fair performance evaluation of multidimensional indexes, and then propose HDDQ_Gen (High-Dimensional Data and Query Generator) that satisfies such requirements. HDDQ_Gen has the following features: (1) clustered distribution, (2) various object distribution in each cluster, (3) various cluster distribution, (4) various correlations among different dimensions, and (5) query distribution depending on data distribution. Using these features, users are able to control the distribution characteristics of data and query sets appropriate for their target applications.

This research was supported by the MIC, Korea, under the ITRC support program supervised by the IITA (IITA-2005-C1090-0502-0009).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bohm, C., Berchtold, S., Keim, D.: Searching in High-Dimensional Spaces-index Structures for Improving the Performance of Multimedia Databases. ACM Computing Surveys 33, 322–373 (2001)
Article Google Scholar
Ogras, U., Ferhatosmanoglu, H.: Dimensionality Reduction Using Magnitude and Shape Approximations. In: Proc. of the 12th Int’l. Conf. on Information and Knowledge Management, pp. 99–107 (2003)
Google Scholar
Jeong, S., Kim, S.-W., Kim, K., Choi, B.-U.: An effective method for approximating the euclidean distance in high-dimensional space. In: Bressan, S., Küng, J., Wagner, R. (eds.) DEXA 2006. LNCS, vol. 4080, pp. 863–872. Springer, Heidelberg (2006)
Chapter Google Scholar
Arya, M., et al.: QBISM: Extending a DBMS to Support 3D Medical Images. In: Proc. Int’l. Conf. on Data Engineering, pp. 314–325. IEEE Computer Society Press, Los Alamitos (1994)
Google Scholar
Berchtold, S., et al.: Fast Nearest Neighbor Search in High-Dimensional Space. In: Proc. Int’l. Conf. on Data Engineering, pp. 209–218. IEEE Computer Society Press, Los Alamitos (1998)
Google Scholar
Weber, R., Schek, H., Blott, S.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In: Proc. Int’l. Conf. on Very Large Data Bases, pp. 194–205 (1998)
Google Scholar
Berchtold, S., Keim, D., Kriegel, H.: The X-tree: An Index Structure for High-Dimensional Data. In: Proc Int’l. Conf. on Very Large Data Bases, pp. 28–39 (1996)
Google Scholar
Zobel, J., Moffat, A., Ramamohanarao, K.: Guidelines for Presentation and Comparison of Indexing Techniques. ACM SIGMOD Record 25, 10–15 (1996)
Article Google Scholar
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is nearest neighbor meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)
Chapter Google Scholar
Kim, S., Aggarwal, C., Yu, P.: Effective Nearest Neighbor Indexing with the Euclidean Metric. In: Proc. ACM Int’l. Conf. on Information and Knowledge Management, pp. 9–16 (2001)
Google Scholar
Jolliffe, I.: Principal Component Analysis. Springer, Heidelberg (1986)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information and Communications, Hanyang University,
Sang-Wook Kim, Seok-Ho Yoon & Sang-Cheol Lee
Dept. of Computer Science and Statistics, Cheju National University,
Junghoon Lee
School of Electrical Engineering and Computer Science, Kyoungpook National University,
Miyoung Shin

Authors

Sang-Wook Kim
View author publications
You can also search for this author in PubMed Google Scholar
Seok-Ho Yoon
View author publications
You can also search for this author in PubMed Google Scholar
Sang-Cheol Lee
View author publications
You can also search for this author in PubMed Google Scholar
Junghoon Lee
View author publications
You can also search for this author in PubMed Google Scholar
Miyoung Shin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Jan van Leeuwen Giuseppe F. Italiano Wiebe van der Hoek Christoph Meinel Harald Sack František Plášil

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, SW., Yoon, SH., Lee, SC., Lee, J., Shin, M. (2007). Generating High Dimensional Data and Query Sets. In: van Leeuwen, J., Italiano, G.F., van der Hoek, W., Meinel, C., Sack, H., Plášil, F. (eds) SOFSEM 2007: Theory and Practice of Computer Science. SOFSEM 2007. Lecture Notes in Computer Science, vol 4362. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69507-3_30

Download citation

DOI: https://doi.org/10.1007/978-3-540-69507-3_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69506-6
Online ISBN: 978-3-540-69507-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics