Skip to main content

Search and Classification of High Dimensional Data

  • Conference paper
  • First Online:
Approximation Algorithms for Combinatorial Optimization (APPROX 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2462))

  • 819 Accesses

Abstract

Modeling data sets as points in a high dimensional vector space is a trendy theme in modern information retrieval and data mining. Among the numerous drawbacks of this approach is the fact that many of the required processing tasks are computationally hard in high dimension. We survey several algorithmic ideas that have applications to the design and analysis of polynomial time approximation schemes for nearest neighbor search and clustering of high dimensional data. The main lesson from this line of research is that if one is willing to settle for approximate solutions, then high dimensional geometry is easy. Examples are included in the reference list below.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. N. Alon, S. Dar, M. Parnas, and D. Ron. Testing of clustering. In Proc. of the 41th Ann. IEEE Symp. on Foundations of Computer Science, 2000, pages 240–250.

    Google Scholar 

  2. M. Bădoiu, S. Har-Peled, and P. Indyk. Approximate clustering via core-sets. In Proc. of the 34th Ann. ACM Symp. on Theory of Computing, 2002.

    Google Scholar 

  3. P. Drineas, A. Frieze, R. Kannan, S. Vempala, and V. Vinay. Clustering in large graphs and matrices. In Proc. of the 10th Ann. ACM-SIAM Symp. on Discrete Algorithms, 1999, pages 291–299.

    Google Scholar 

  4. W. Fernandez de la Vega, M. Karpinski, C. Kenyon, and Y. Rabani. Polynomial time approximation schemes for metric min-sum clustering. Electronic Colloquium on Computational Complexity report number TR02-025. Available at ftp://ftp.eccc.uni-trier.de/pub/eccc/reports/2002/TR02-025/index.html

  5. S. Har-Peled and K.R. Varadarajan. Projective clustering in high dimensions using core-sets. In Proc. of the 18th Ann. ACM Symp. on Computational Geometry, 2002, pages 312–318.

    Google Scholar 

  6. P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proc. of the 30th Ann. ACM Symp. on Theory of Computing, 1998, pages 604–613.

    Google Scholar 

  7. J. Kleinberg. Two algorithms for nearest-neighbor search in high dimensions. In Proc. of the 29th Ann. ACM Symp. on Theory of Computing, 1997, pages 599–608.

    Google Scholar 

  8. E. Kushilevitz, R. Ostrovsky, and Y. Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM J. Comput., 30(2):457–474, 2000. Preliminary version appeared in STOC’ 98.

    Article  MATH  MathSciNet  Google Scholar 

  9. N. Mishra, D. Oblinger, and L. Pitt. Sublinear time approximate clustering. In Proc. of the 12th Ann. ACM-SIAM Symp. on Discrete Algorithms, January 2001, pages 439–447.

    Google Scholar 

  10. R. Ostrovsky and Y. Rabani. Polynomial time approximation schemes for geometric clustering problems. J. of the ACM, 49(2):139–156, March 2002. Preliminary version appeared in FOCS’ 00.

    Article  MathSciNet  Google Scholar 

  11. L.J. Schulman. Clustering for edge-cost minimization. In Proc. of the 32nd Ann. ACM Symp. on Theory of Computing, 2000, pages 547–555.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rabani, Y. (2002). Search and Classification of High Dimensional Data. In: Jansen, K., Leonardi, S., Vazirani, V. (eds) Approximation Algorithms for Combinatorial Optimization. APPROX 2002. Lecture Notes in Computer Science, vol 2462. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45753-4_1

Download citation

  • DOI: https://doi.org/10.1007/3-540-45753-4_1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44186-1

  • Online ISBN: 978-3-540-45753-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics