Skip to main content

On the Correlation Between Local Intrinsic Dimensionality and Outlierness

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11223))

Abstract

Data mining methods for outlier detection are usually based on non-parametric density estimates in various variations. Here we argue for the use of local intrinsic dimensionality as a measure of outlierness and demonstrate empirically that it is a meaningful alternative and complement to classic methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.dbs.ifi.lmu.de/research/outlier-evaluation.

  2. 2.

    http://www.dbs.ifi.lmu.de/research/outlier-evaluation.

References

  1. Amsaleg, L., Bailey, J., Barbe, D., Erfani, S.M., Houle, M.E., Nguyen, V., Radovanović, M.: The vulnerability of learning to adversarial perturbation increases with intrinsic dimensionality. In: WIFS 2017, pp. 1–6 (2017)

    Google Scholar 

  2. Amsaleg, L., Chelly, O., Furon, T., Girard, S., Houle, M.E., Kawarabayashi, K., Nett, M.: Estimating local intrinsic dimensionality. In: Proceedings of KDD (2015)

    Google Scholar 

  3. Angiulli, F., Pizzuti, C.: Fast outlier detection in high dimensional spaces. In: Proceedings of PKDD, pp. 15–26 (2002)

    Chapter  Google Scholar 

  4. Angiulli, F., Pizzuti, C.: Outlier mining in large high-dimensional data sets. IEEE TKDE 17(2), 203–215 (2005)

    MATH  Google Scholar 

  5. Barnett, V., Lewis, T.: Outliers in Statistical Data, 3rd edn. Wiley, Hoboken (1994)

    MATH  Google Scholar 

  6. Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbors. In: Proceedings of ICML, pp. 97–104 (2006)

    Google Scholar 

  7. Breunig, M.M., Kriegel, H.P., Ng, R., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of SIGMOD, pp. 93–104 (2000)

    Article  Google Scholar 

  8. Camastra, F., Vinciarelli, A.: Estimating the intrinsic dimension of data with a fractal-based method. IEEE TPAMI 24(10), 1404–1407 (2002)

    Article  Google Scholar 

  9. Campello, R.J.G.B., Moulavi, D., Zimek, A., Sander, J.: Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM TKDD 10(1), 5:1–5:51 (2015)

    Google Scholar 

  10. Campos, G.O., Zimek, A., Sander, J., Campello, R.J.G.B., Micenková, B., Schubert, E., Assent, I., Houle, M.E.: On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min. Knowl. Discov. 30, 891–927 (2016)

    Article  MathSciNet  Google Scholar 

  11. Casanova, G., Englmeier, E., Houle, M., Kroeger, P., Nett, M., Schubert, E., Zimek, A.: Dimensional testing for reverse k-nearest neighbor search. PVLDB 10(7), 769–780 (2017)

    Google Scholar 

  12. Costa, J.A., Hero, A.O.: Entropic graphs for manifold learning. In: 37th Asilomar Conference on Signals, Systems, and Computers, vol. 1, pp. 316–320 (2003)

    Google Scholar 

  13. de Vries, T., Chawla, S., Houle, M.E.: Density-preserving projections for large-scale local anomaly detection. KAIS 32(1), 25–52 (2012)

    Google Scholar 

  14. Falconer, K.: Fractal Geometry: Mathematical Foundations and Applications. Wiley, Hoboken (2003)

    Book  MATH  Google Scholar 

  15. Fraga Alves, M., de Haan, L., Lin, T.: Estimation of the parameter controlling the speed of convergence in extreme value theory. Math. Methods Stat. 12(2), 155–176 (2003)

    MathSciNet  Google Scholar 

  16. Grassberger, P., Procaccia, I.: Characterization of strange attractors. Phys. Rev. Lett. 50, 346–349 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  17. Grubbs, F.E.: Procedures for detecting outlying observations in samples. Technometrics 11(1), 1–21 (1969)

    Article  Google Scholar 

  18. Gupta, A., Krauthgamer, R., Lee, J.R.: Bounded geometries, fractals, and low-distortion embeddings. In: Proceedings of FOCS, pp. 534–543 (2003)

    Google Scholar 

  19. Hautamäki, V., Kärkkäinen, I., Fränti, P.: Outlier detection using k-nearest neighbor graph. In: Proceedings of ICPR, pp. 430–433 (2004)

    Google Scholar 

  20. Hawkins, D.: Identification of Outliers. Chapman and Hall, Boca Raton (1980)

    Book  MATH  Google Scholar 

  21. Hein, M., Audibert, J.Y.: Intrinsic dimensionality estimation of submanifolds in \(R^d\). In: Proceedings of ICML, pp. 289–296 (2005)

    Google Scholar 

  22. Hill, B.M.: A simple general approach to inference about the tail of a distribution. Ann. Stat. 3(5), 1163–1174 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  23. Houle, M.E.: Dimensionality, discriminability, density and distance distributions. In: Proceedings of ICDM Workshops, pp. 468–473 (2013)

    Google Scholar 

  24. Houle, M.E.: Local intrinsic dimensionality I: an extreme-value-theoretic foundation for similarity applications. In: Beecks, C., Borutta, F., Kröger, P., Seidl, T. (eds.) SISAP 2017. LNCS, vol. 10609, pp. 64–79. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68474-1_5

    Chapter  Google Scholar 

  25. Houle, M.E.: Local intrinsic dimensionality II: multivariate analysis and distributional support. In: Beecks, C., Borutta, F., Kröger, P., Seidl, T. (eds.) SISAP 2017. LNCS, vol. 10609, pp. 80–95. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68474-1_6

    Chapter  Google Scholar 

  26. Houle, M.E., Kashima, H., Nett, M.: Generalized expansion dimension. In: ICDM Workshop PTDM, pp. 587–594 (2012)

    Google Scholar 

  27. Houle, M.E., Ma, X., Nett, M., Oria, V.: Dimensional testing for multi-step similarity search. In: Proceedings of ICDM, pp. 299–308 (2012)

    Google Scholar 

  28. Houle, M.E., Ma, X., Oria, V.: Effective and efficient algorithms for flexible aggregate similarity search in high dimensional spaces. IEEE TKDE 27(12), 3258–3273 (2015)

    Google Scholar 

  29. Houle, M.E., Ma, X., Oria, V., Sun, J.: Efficient algorithms for similarity search in axis-aligned subspaces. In: Traina, A.J.M., Traina, C., Cordeiro, R.L.F. (eds.) SISAP 2014. Lecture Notes in Computer Science, vol. 8821, pp. 1–12. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11988-5_1

    Chapter  Google Scholar 

  30. Houle, M.E., Ma, X., Oria, V., Sun, J.: Query expansion for content-based similarity search using local and global features. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 13(3), 1–23 (2017)

    Article  Google Scholar 

  31. Houle, M.E., Oria, V., Wali, A.M.: Improving \(k\)-nn graph accuracy using local intrinsic dimensionality. In: Beecks, C., Borutta, F., Kröger, P., Seidl, T. (eds.) SISAP 2017. LNCS, vol. 10609, pp. 110–124. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68474-1_8

    Chapter  Google Scholar 

  32. Houle, M.E., Nett, M.: Rank-based similarity search: reducing the dimensional dependence. IEEE TPAMI 37(1), 136–150 (2015)

    Article  Google Scholar 

  33. Huisman, R., Koedijk, K.G., Kool, C.J.M., Palm, F.: Tail-index estimates in small samples. J. Bus. Econ. Stat. 19(2), 208–216 (2001)

    Article  MathSciNet  Google Scholar 

  34. Jin, W., Tung, A.K.H., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 577–593. Springer, Heidelberg (2006). https://doi.org/10.1007/11731139_68

    Chapter  Google Scholar 

  35. Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer, New York (2002). https://doi.org/10.1007/b98835

    Book  MATH  Google Scholar 

  36. Karger, D.R., Ruhl, M.: Finding nearest neighbors in growth-restricted metrics. In: Proceedings of STOC, pp. 741–750 (2002)

    Google Scholar 

  37. Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of VLDB, pp. 392–403 (1998)

    Google Scholar 

  38. Kriegel, H.P., Kröger, P., Schubert, E., Zimek, A.: LoOP: local outlier probabilities. In: Proceedings of CIKM, pp. 1649–1652 (2009)

    Google Scholar 

  39. Kriegel, H.P., Kröger, P., Schubert, E., Zimek, A.: Interpreting and unifying outlier scores. In: Proceedings of SDM, pp. 13–24 (2011)

    Chapter  Google Scholar 

  40. Kriegel, H.P., Schubert, M., Zimek, A.: Angle-based outlier detection in high-dimensional data. In: Proceedings of KDD, pp. 444–452 (2008)

    Google Scholar 

  41. Larrañaga, P., Lozano, J.A.: Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation, vol. 2. Springer, New York (2002). https://doi.org/10.1007/978-1-4615-1539-5

    Book  MATH  Google Scholar 

  42. Latecki, L.J., Lazarevic, A., Pokrajac, D.: Outlier Detection with Kernel Density Functions. In: Perner, P. (ed.) MLDM 2007. LNCS (LNAI), vol. 4571, pp. 61–75. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73499-4_6

    Chapter  Google Scholar 

  43. Levina, E., Bickel, P.J.: Maximum likelihood estimation of intrinsic dimension. In: Proceedings of NIPS, pp. 777–784 (2004)

    Google Scholar 

  44. Ma, X., Li, B., Wang, Y., Erfani, S.M., Wijewickrema, S.N.R., Schoenebeck, G., Song, D., Houle, M.E., Bailey, J.: Characterizing adversarial subspaces using local intrinsic dimensionality, pp. 1–15 (2018)

    Google Scholar 

  45. Ma, X., Wang, Y., Houle, M.E., Zhou, S., Erfani, S.M., Xia, S., Wijewickrema, S.N.R., Bailey, J.: Dimensionality-driven learning with noisy labels, pp. 1–10 (2018)

    Google Scholar 

  46. Navarro, G., Paredes, R., Reyes, N., Bustos, C.: An empirical evaluation of intrinsic dimension estimators. Inf. Syst. 64, 206–218 (2017)

    Article  Google Scholar 

  47. Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: LOCI: fast outlier detection using the local correlation integral. In: Proceedings of ICDE, pp. 315–326 (2003)

    Google Scholar 

  48. Pei, Y., Zaïane, O., Gao, Y.: An efficient reference-based approach to outlier detection in large datasets. In: Proceedings of ICDM, pp. 478–487 (2006)

    Google Scholar 

  49. Radovanović, M., Nanopoulos, A., Ivanović, M.: Reverse nearest neighbors in unsupervised distance-based outlier detection. IEEE TKDE 27, 1369–1382 (2015)

    Google Scholar 

  50. Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of SIGMOD, pp. 427–438 (2000)

    Article  Google Scholar 

  51. Romano, S., Chelly, O., Nguyen, V., Bailey, J., Houle, M.E.: Measuring dependency via intrinsic dimensionality, pp. 1207–1212 (2016)

    Google Scholar 

  52. Rousseeuw, P.J., Hubert, M.: Robust statistics for outlier detection. WIREs DMKD 1(1), 73–79 (2011)

    Google Scholar 

  53. Rozza, A., Lombardi, G., Ceruti, C., Casiraghi, E., Campadelli, P.: Novel high intrinsic dimensionality estimators. Mach. Learn. 89(1–2), 37–65 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  54. Schubert, E., Gertz, M.: Intrinsic t-stochastic neighbor embedding for visualization and outlier detection. In: Proceedings of SISAP, pp. 188–203 (2017)

    Chapter  Google Scholar 

  55. Schubert, E., Koos, A., Emrich, T., Züfle, A., Schmid, K.A., Zimek, A.: A framework for clustering uncertain data. PVLDB 8(12), 1976–1979 (2015)

    Google Scholar 

  56. Schubert, E., Zimek, A., Kriegel, H.P.: Generalized outlier detection with flexible kernel density estimates. In: Proceedings of SDM, pp. 542–550 (2014)

    Chapter  Google Scholar 

  57. Schubert, E., Zimek, A., Kriegel, H.P.: Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min. Knowl. Discov. 28(1), 190–237 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  58. Schubert, E., Zimek, A., Kriegel, H.-P.: Fast and scalable outlier detection with approximate nearest neighbor ensembles. In: Renz, M., Shahabi, C., Zhou, X., Cheema, M.A. (eds.) DASFAA 2015. LNCS, vol. 9050, pp. 19–36. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18123-3_2

    Chapter  Google Scholar 

  59. Takens, F.: On the numerical determination of the dimension of an attractor. In: Braaksma, B.L.J., Broer, H.W., Takens, F. (eds.) Dynamical Systems and Bifurcations. LNM, vol. 1125, pp. 99–106. Springer, Heidelberg (1985). https://doi.org/10.1007/BFb0075637

    Chapter  Google Scholar 

  60. Tang, J., Chen, Z., Fu, A.W., Cheung, D.W.: Enhancing effectiveness of outlier detections for low density patterns. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 535–548. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47887-6_53

    Chapter  Google Scholar 

  61. Wang, Y., Parthasarathy, S., Tatikonda, S.: Locality sensitive outlier detection: a ranking driven approach. In: Proceedings of ICDE, pp. 410–421 (2011)

    Google Scholar 

  62. Zhang, K., Hutter, M., Jin, H.: A new local distance-based outlier detection approach for scattered real-world data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 813–822. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_84

    Chapter  Google Scholar 

  63. Zimek, A., Campello, R.J.G.B., Sander, J.: Ensembles for unsupervised outlier detection: challenges and research questions. SIGKDD Explor. 15(1), 11–22 (2013)

    Article  Google Scholar 

Download references

Acknowledgments

M. E. Houle supported by JSPS Kakenhi Kiban (B) Research Grant 18H03296.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Erich Schubert .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Houle, M.E., Schubert, E., Zimek, A. (2018). On the Correlation Between Local Intrinsic Dimensionality and Outlierness. In: Marchand-Maillet, S., Silva, Y., Chávez, E. (eds) Similarity Search and Applications. SISAP 2018. Lecture Notes in Computer Science(), vol 11223. Springer, Cham. https://doi.org/10.1007/978-3-030-02224-2_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-02224-2_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-02223-5

  • Online ISBN: 978-3-030-02224-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics