Abstract
Previous researches on multidimensional indexes typically have used synthetic data sets distributed uniformly or normally over multidimensional space for performance evaluation. These kinds of data sets hardly reflect the characteristics of multimedia database applications. In this paper, we discuss issues on generating high dimensional data and query sets for resolving the problem. We first identify the requirements of the data and query sets for fair performance evaluation of multidimensional indexes, and then propose HDDQ_Gen (High-Dimensional Data and Query Generator) that satisfies such requirements. HDDQ_Gen has the following features: (1) clustered distribution, (2) various object distribution in each cluster, (3) various cluster distribution, (4) various correlations among different dimensions, and (5) query distribution depending on data distribution. Using these features, users are able to control the distribution characteristics of data and query sets appropriate for their target applications.
This research was supported by the MIC, Korea, under the ITRC support program supervised by the IITA (IITA-2005-C1090-0502-0009).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bohm, C., Berchtold, S., Keim, D.: Searching in High-Dimensional Spaces-index Structures for Improving the Performance of Multimedia Databases. ACM Computing Surveys 33, 322–373 (2001)
Ogras, U., Ferhatosmanoglu, H.: Dimensionality Reduction Using Magnitude and Shape Approximations. In: Proc. of the 12th Int’l. Conf. on Information and Knowledge Management, pp. 99–107 (2003)
Jeong, S., Kim, S.-W., Kim, K., Choi, B.-U.: An effective method for approximating the euclidean distance in high-dimensional space. In: Bressan, S., Küng, J., Wagner, R. (eds.) DEXA 2006. LNCS, vol. 4080, pp. 863–872. Springer, Heidelberg (2006)
Arya, M., et al.: QBISM: Extending a DBMS to Support 3D Medical Images. In: Proc. Int’l. Conf. on Data Engineering, pp. 314–325. IEEE Computer Society Press, Los Alamitos (1994)
Berchtold, S., et al.: Fast Nearest Neighbor Search in High-Dimensional Space. In: Proc. Int’l. Conf. on Data Engineering, pp. 209–218. IEEE Computer Society Press, Los Alamitos (1998)
Weber, R., Schek, H., Blott, S.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In: Proc. Int’l. Conf. on Very Large Data Bases, pp. 194–205 (1998)
Berchtold, S., Keim, D., Kriegel, H.: The X-tree: An Index Structure for High-Dimensional Data. In: Proc Int’l. Conf. on Very Large Data Bases, pp. 28–39 (1996)
Zobel, J., Moffat, A., Ramamohanarao, K.: Guidelines for Presentation and Comparison of Indexing Techniques. ACM SIGMOD Record 25, 10–15 (1996)
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is nearest neighbor meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)
Kim, S., Aggarwal, C., Yu, P.: Effective Nearest Neighbor Indexing with the Euclidean Metric. In: Proc. ACM Int’l. Conf. on Information and Knowledge Management, pp. 9–16 (2001)
Jolliffe, I.: Principal Component Analysis. Springer, Heidelberg (1986)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Kim, SW., Yoon, SH., Lee, SC., Lee, J., Shin, M. (2007). Generating High Dimensional Data and Query Sets. In: van Leeuwen, J., Italiano, G.F., van der Hoek, W., Meinel, C., Sack, H., Plášil, F. (eds) SOFSEM 2007: Theory and Practice of Computer Science. SOFSEM 2007. Lecture Notes in Computer Science, vol 4362. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69507-3_30
Download citation
DOI: https://doi.org/10.1007/978-3-540-69507-3_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69506-6
Online ISBN: 978-3-540-69507-3
eBook Packages: Computer ScienceComputer Science (R0)