Skip to main content

Cofe: A Scalable Method for Feature Extraction from Complex Objects

  • Conference paper
  • First Online:
Book cover Data Warehousing and Knowledge Discovery (DaWaK 2000)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1874))

Included in the following conference series:

  • 505 Accesses

Abstract

Feature Extraction, also known as Multidimensional Scaling, is a basic primitive associated with indexing, clustering, nearest neighbor searching and visualization. We consider the problem of feature extraction when the data-points are complex and the distance evaluation function is very expensive to evaluate. Examples of expensive distance evaluations include those for computing the Hausdor. distance between polygons in a spatial database, or the edit distance between macromolecules in a DNA or protein database.

We propose Cofe, a method for sparse feature extraction which is based on novel random non-linear projections. We evaluate Cofe on real data and find that it performs very well in terms of quality of features extracted, number of distances evaluated, number of database scans performed and total run time.We further propose Cofe-GR, which matches Cofe in terms of distance evaluations and run-time, but outperforms it in terms of quality of features extracted.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bairoch, A., Apweiler, R.: The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1998. Nucleic Acids Res. 26 (1998) 38–42

    Article  Google Scholar 

  2. Berchtold, S., Böhm, C.: The Pyramid-Technique: Towards breaking the curse of dimensionality. Proc. ACM SIGMOD Conf. (1998) 142–176

    Google Scholar 

  3. Berchtold, S., Böhm, C., Keim, D.A., Kriegel, H.-P.: A cost model for nearest neighbor search in high-dimensional data space. Proc. ACM PODS Symposium (1997)

    Google Scholar 

  4. Berchtold, S., Keim, D.A., Kriegel, H.-P.: The X-tree: An index structure for highdimensional data. Proc. 22nd VLDB Conf. (1996) 28–39

    Google Scholar 

  5. Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*. tree: An E.cient and Robust Access Method for Points and Rectangles. Proc. ACM SIGMOD Conf. (1990) 322–331

    Google Scholar 

  6. Bourgain, J.: On Lipschitz embedding of finite metric spaces in Hilbert space. Israel J. of Math. (1985) 52:46–52

    Article  MATH  MathSciNet  Google Scholar 

  7. Faloutsos, C., Lin, K.-I.: FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. Proc. ACM SIGMOD 24(2) (1995) 163–174

    Article  Google Scholar 

  8. Fukunaga, K.: Introduction to Statistical Pattern Recognition. Academic Press, 2nd Edition (1990)

    Google Scholar 

  9. Ganti, V., Ramakrishnan, R., Gehrke, J., Powell, A., French, J.: Clustering large datasets in arbitrary metric spaces. Proc. 15th ICDE Conf. (1999) 502–511

    Google Scholar 

  10. Golub, G. H., Van Loan, C. F.: Matrix computations. Johns Hopkins University Press, 2nd Edition (1989)

    Google Scholar 

  11. Kanth, K. V. R., Agrawal, D., Singh, A.: Dimensionality reduction for similarity searching in dynamic databases. Proc. ACM SIGMOD Conf. (1998) 142–176

    Google Scholar 

  12. Katayama, N., Satoh, S.: The SR-tree: An index structure for high-dimensional nearest neighbor queries. Proc. ACM SIGMOD Conf. (1997) 369–380

    Google Scholar 

  13. Kruskal, J.B.: Multidimensional Scaling by Optimizing Goodness of Fit to a Nonmetric Hypothesis. Psychometrika 29 (1964) 1–27

    Article  MATH  MathSciNet  Google Scholar 

  14. Kruskal, J.B.: Multidimensional Scaling and other Methods for Discovering Structure. Stat. Meth. for Digital Computers, Wiley, New York (1977) 296–339

    Google Scholar 

  15. Kruskal, J.B., Wish, M.: Multidimensional Scaling. Sage University Paper series on Quantitative Applications in the Social Sciences, Beverly Hills, CA (1978) 7–11

    Google Scholar 

  16. Lin, K.-I., Jagadish, H. V., Faloutsos, C.: The TV-tree: An index structure for high-dimensional data. Proc. 20th VLDB Conf. 3(4) (1994) 517–542

    Google Scholar 

  17. Linial, N., London, E., Rabinovich, Y.: The geometry of graphs and some of its algorithmic applications. Proc. 35th IEEE FOCS Symp. (1994) 577–591

    Google Scholar 

  18. Linial, M., Linial, N., Tishby, N., Yona, G.: Global self organization of all known protein sequences reveals inherent biological signatures. J. Mol. Biol. 268 (1997) 539–556

    Article  Google Scholar 

  19. Smith, T., Waterman, M.: The identi.cation of common molecular subsequences. J. Mol. Biol. 147 (1981) 195–197

    Article  Google Scholar 

  20. White, D. A., Jain, R.: Similarity Indexing with the SS-tree. Proc. 12th ICDE Conf. (1996) 516–523

    Google Scholar 

  21. Wang, W., Yang, J., Muntz, R. R.: PK-tree: A Spatial Index Structure for High Dimensional Point Data. 5th Intl. FODO Conf. (1998)

    Google Scholar 

  22. Zhang, T., Ramkarishnan, R., Livny, M.: Birch: An efficient data clustering method for large databases. Proc. ACM SIGMOD Conf. (1996) 103–114

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hristescu, G., Farach-Colton, M. (2000). Cofe: A Scalable Method for Feature Extraction from Complex Objects. In: Kambayashi, Y., Mohania, M., Tjoa, A.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2000. Lecture Notes in Computer Science, vol 1874. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44466-1_36

Download citation

  • DOI: https://doi.org/10.1007/3-540-44466-1_36

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67980-6

  • Online ISBN: 978-3-540-44466-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics