Advertisement

Bulk Loading the MKL-Tree

  • Annalisa Franco
  • Alessandra Lumini
  • Dario Maio
Conference paper
  • 494 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2736)

Abstract

MKL-tree is a hierarchical, height-balanced structure for high dimensional data indexing. This structure is based on data representation in a lower dimensional space by means of the MKL transform, a multi-space generalization of the KL transform. A local dimensionality reduction is performed at each node of the tree, allowing more selective features to be extracted and thus increasing the discriminating power of the index. The dynamical version of MKL-tree presents two main drawbacks: first, the incremental loading of data points can determine very different structures and, as a consequence, different query performance, depending on the insertion order; second, the creation of the index can be very expensive, due to the high number of updating required. Since, in real applications, a large dataset is usually available at the tree creation time, we propose a new bulk loading technique for MKL-tree, based on a recursive clustering of data objects. The new algorithm searches for an optimal partitioning of data points, in order to calculate the most suitable KL-subspaces to represent the dataset.

Experimental results show that bulk loading can significantly improve the index performance with respect to the incremental insertion procedure, both in terms of effectiveness of similarity searches and of efficiency of the loading procedure.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, C.C., Procopiuc, C., Wolf, J.L., Yu, P.S., Park, J.S.: Fast Algorithms for Projected Clustering. In: Proc. of ACM SIGMOD 1999 (1999)Google Scholar
  2. 2.
    Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In: Proc. of ACM SIGMOD 1999 (1999)Google Scholar
  3. 3.
    Cappelli, R., Maio, D., Maltoni, D.: Multi-space KL for Pattern Recognition and Classification. IEEE Transactions on PAMI 23(9), 977–996 (2001)Google Scholar
  4. 4.
    Cappelli, R., Lumini, A., Maio, D.: MKL-tree: a hierarchical data structure for indexing multidimensional data. In: Hameurlain, A., Cicchetti, R., Traunmüller, R. (eds.) DEXA 2002. LNCS, vol. 2453, pp. 914–924. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  5. 5.
    Cappelli, R., Maio, D., Maltoni, D.: Similarity Search using Multi-space KL. In: Proc. of IWOSS 1999, Florence, Italy, pp. 155–160 (1999)Google Scholar
  6. 6.
    Ferhatosmanoglu, H., Tuncel, E., Agrawal, D., El Abbadi, A.: Vector approximation based indexing for non-uniform high dimensional data sets. In: Proc. of the 9th ACM Int. Conf. on Information and Knowledge Management, McLean, Virginia, pp. 202–209 (November 2000)Google Scholar
  7. 7.
    Figueiredo, M., Jain, A.K.: Unsupervised learning of finite mixture models. IEEE Transaction on PAMI 24(3), 381–396 (2002)Google Scholar
  8. 8.
    Franco, A., Lumini, A., Maio, D.: Eigenspace merging for model updating. In: Proc. of ICPR 2002, Québec City (Canada), vol. 2, pp. 156–159 (August 2002)Google Scholar
  9. 9.
    Fukunaga, K.: Statistical Pattern Recognition. Academic Press, San Diego (1990)zbMATHGoogle Scholar
  10. 10.
    Gaede, V., Günther, O.: Multidimensional Access Methods. ACM Computing Surveys 30(2) (1998)Google Scholar
  11. 11.
    Guttman, A.: R-trees: A Dynamic Index Structure for Spatial Searching. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, Boston, USA, pp. 47–57 (1984)Google Scholar
  12. 12.
    Jolliffe, I.T.: Principal Component Analysis. Springer, New York (1986)Google Scholar
  13. 13.
    Kamel, I., Faloutsos, C.: Hilbert R-tree: An Improved R-tree using Fractals. In: Proc. Of VLDB 1994, pp. 500–509 (1994)Google Scholar
  14. 14.
    MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. of the Fifth Berkeley Symposium on Mathematical statistics and probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)Google Scholar
  15. 15.
    Ortega, M., Rui, Y., Chakrabarti, K., Mehrotra, S., Huang, T.S.: Supporting similarity queries in MARS. In: ACM Conf. on Multimedia, Seattle, USA (November 1997)Google Scholar
  16. 16.
    Samet, H.: The Design and Analysis of Spatial Data Structures. Addison Wesley, Reading (1990)Google Scholar
  17. 17.
    Swets, D.L., Weng, J.: Hierarchical Discriminant Analysis for Image Retrieval. IEEE Transactions on PAMI 21(5), 386–401 (1999)Google Scholar
  18. 18.
    Van den Bercken, J., Seeger, B., Widmayer, P.: A generic approach to bulk loading multidimensional index structures. In: Proc. of VLDB 1997, Atene (Grecia), pp. 406–415 (1997)Google Scholar
  19. 19.
    Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: A New Data Clustering Algorithm and its Applications. Data Mining and Knowledge Discovery 1(2) (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Annalisa Franco
    • 1
  • Alessandra Lumini
    • 1
  • Dario Maio
    • 1
  1. 1.DEIS Università di BolognaBolognaItaly

Personalised recommendations