Skip to main content

Local Intrinsic Dimensionality I: An Extreme-Value-Theoretic Foundation for Similarity Applications

  • Conference paper
  • First Online:
Similarity Search and Applications (SISAP 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10609))

Included in the following conference series:

Abstract

Researchers have long considered the analysis of similarity applications in terms of the intrinsic dimensionality (ID) of the data. This theory paper is concerned with a generalization of a discrete measure of ID, the expansion dimension, to the case of smooth functions in general, and distance distributions in particular. A local model of the ID of smooth functions is first proposed and then explained within the well-established statistical framework of extreme value theory (EVT). Moreover, it is shown that under appropriate smoothness conditions, the cumulative distribution function of a distance distribution can be completely characterized by an equivalent notion of data discriminability. As the local ID model makes no assumptions on the nature of the function (or distribution) other than continuous differentiability, its extreme generality makes it ideally suited for the non-parametric or unsupervised learning tasks that often arise in similarity applications. An extension of the local ID model is also provided that allows the local assessment of the rate of change of function growth, which is then shown to have potential implications for the detection of inliers and outliers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Falconer, K.: Fractal Geometry: Mathematical Foundations and Applications. Wiley, Hoboken (2003)

    Book  MATH  Google Scholar 

  2. Grassberger, P., Procaccia, I.: Measuring the strangeness of strange attractors. Physica D 9(1–2), 189–208 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  3. Camastra, F., Vinciarelli, A.: Estimating the intrinsic dimension of data with a fractal-based method. IEEE TPAMI 24(10), 1404–1407 (2002)

    Article  Google Scholar 

  4. Gupta, A., Krauthgamer, R., Lee, J.R.: Bounded geometries, fractals, and low-distortion embeddings. In: FOCS, pp. 534–543 (2003)

    Google Scholar 

  5. Rozza, A., Lombardi, G., Ceruti, C., Casiraghi, E., Campadelli, P.: Novel high intrinsic dimensionality estimators. Mach. Learn. J. 89(1–2), 37–65 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  6. Larrañaga, P., Lozano, J.A.: Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation, vol. 2. Springer, Heidelberg (2002)

    MATH  Google Scholar 

  7. Karger, D.R., Ruhl, M.: Finding nearest neighbors in growth-restricted metrics. In: STOC, pp. 741–750 (2002)

    Google Scholar 

  8. Houle, M.E., Kashima, H., Nett, M.: Generalized expansion dimension. In: ICDMW, pp. 587–594 (2012)

    Google Scholar 

  9. Houle, M.E.: Dimensionality, discriminability, density & distance distributions. In: ICDMW, pp. 468–473 (2013)

    Google Scholar 

  10. Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbors. In: ICML, pp. 97–104 (2006)

    Google Scholar 

  11. Houle, M.E., Ma, X., Nett, M., Oria, V.: Dimensional testing for multi-step similarity search. In: ICDM, pp. 299–308 (2012)

    Google Scholar 

  12. Houle, M.E., Nett, M.: Rank-based similarity search: reducing the dimensional dependence. IEEE TPAMI 37(1), 136–150 (2015)

    Article  Google Scholar 

  13. Houle, M.E., Ma, X., Oria, V., Sun, J.: Efficient similarity search within user-specified projective subspaces. Inf. Syst. 59, 2–14 (2016)

    Article  Google Scholar 

  14. Casanova, G., Englmeier, E., Houle, M.E., Kröger, P., Nett, M., Zimek, A.: Dimensional testing for reverse \(k\)-nearest neighbor search. PVLDB 10(7), 769–780 (2017)

    Google Scholar 

  15. de Vries, T., Chawla, S., Houle, M.E.: Density-preserving projections for large-scale local anomaly detection. Knowl. Inf. Syst. 32(1), 25–52 (2012)

    Article  Google Scholar 

  16. Furon, T., Jégou, H.: Using Extreme Value Theory for Image Detection. Research report RR-8244, INRIA, February 2013

    Google Scholar 

  17. Balkema, A.A., de Haan, L.: Residual life time at great age. Ann. Probab. 2, 792–804 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  18. Pickands, J.: Statistical inference using extreme order statistics. Ann. Stat. 3, 119–131 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  19. Coles, S.: An Introduction to Statistical Modeling of Extreme Values. Springer, London (2001)

    Book  MATH  Google Scholar 

  20. Karamata, J.: Sur un mode de croissance réguliere des fonctions. Mathematica (Cluj) 4, 38–53 (1930)

    MATH  Google Scholar 

  21. Gomes, M.I., Canto e Castro, L., Fraga Alves, M.I., Pestana, D.: Statistics of extremes for IID data and breakthroughs in the estimation of the extreme value index: Laurens de Haan leading contributions. Extremes 11, 3–34 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  22. Amsaleg, L., Chelly, O., Furon, T., Girard, S., Houle, M.E., Kawarabayashi, K., Nett, M.: Estimating local intrinsic dimensionality. In: KDD, pp. 29–38 (2015)

    Google Scholar 

  23. Hill, B.M.: A simple general approach to inference about the tail of a distribution. Ann. Stat. 3(5), 1163–1174 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  24. Romano, S., Chelly, O., Nguyen, V., Bailey, J., Houle, M.E.: Measuring dependency via intrinsic dimensionality. In: ICPR, pp. 1207–1212 (2016)

    Google Scholar 

  25. Krauthgamer, R., Lee, J.R.: Navigating nets: simple algorithms for proximity search. In: SODA, pp. 798–807 (2004)

    Google Scholar 

  26. Houle, M.E., Ma, X., Oria, V.: Effective and efficient algorithms for flexible aggregate similarity search in high dimensional spaces. IEEE TKDE 27(12), 3258–3273 (2015)

    Google Scholar 

  27. Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. SIGMOD Rec. 29(2), 93–104 (2000)

    Article  Google Scholar 

  28. Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)

    Article  Google Scholar 

  29. Beirlant, J., Goegebeur, Y., Segers, J., Teugels, J.: Statistics of Extremes: Theory and Applications. Wiley, Hoboken (2004)

    Book  MATH  Google Scholar 

  30. de Haan, L., Stadtmüller, U.: Generalized regular variation of second order. J. Aust. Math. Soc. (Series A) 61(3), 381–395 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  31. de Haan, L., Resnick, S.: Second-order regular variation and rates of convergence in extreme-value theory. Ann. Probab. 24(1), 97–124 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  32. Fraga Alves, M.I., de Haan, L., Lin, T.: Estimation of the parameter controlling the speed of convergence in extreme value theory. Math. Methods Stat. 12(2), 155–176 (2003)

    MathSciNet  Google Scholar 

  33. Houle, M.E.: Local intrinsic dimensionality II: multivariate analysis and distributional support. In: SISAP, pp. 1–16 (2017)

    Google Scholar 

Download references

Acknowledgments

The author gratefully acknowledges the financial support of JSPS Kakenhi Kiban (A) Research Grant 25240036 and JSPS Kakenhi Kiban (B) Research Grant 15H02753.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael E. Houle .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Houle, M.E. (2017). Local Intrinsic Dimensionality I: An Extreme-Value-Theoretic Foundation for Similarity Applications. In: Beecks, C., Borutta, F., Kröger, P., Seidl, T. (eds) Similarity Search and Applications. SISAP 2017. Lecture Notes in Computer Science(), vol 10609. Springer, Cham. https://doi.org/10.1007/978-3-319-68474-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-68474-1_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-68473-4

  • Online ISBN: 978-3-319-68474-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics