Abstract
The local intrinsic dimensionality (LID) model enables assessment of the complexity of the local neighbourhood around a specific query object of interest. In this paper, we study variations in the LID of a query, with respect to different subspaces and local neighbourhoods. We illustrate the surprising phenomenon of how the LID of a query can substantially decrease as further features are included in a dataset. We identify the role of two key feature properties in influencing the LID for feature combinations: correlation and dominance. Our investigation provides new insights into the impact of different feature combinations on local regions of the data.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
Suppose \(q=0\) \(\in \) \(\mathbb {R}\) and \(x_1=2 \in X\) are 1 dimensional data values. Then, \(x_1\) directly represents a distance value from q to itself along the X axis.
- 4.
In fact, our model allows \(F_X\) (or \(F_Y\)) to be a set of features, rather than a single feature, but for simplicity we will present in the context of being a single feature.
- 5.
References
Bouveyron, C., Celeux, G., Girard, S.: Intrinsic dimension estimation by maximum likelihood in probabilistic PCA. Pattern Recogn. Lett. 32, 1706–1713 (2011)
Tenenbaum, J.B., Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. In: ICCV, vol. 290, pp. 2323–2326 (2000)
Amsaleg, L., et al.: Extreme-value-theoretic estimation of local intrinsic dimensionality. DMKD 32(6), 1768–1805 (2018)
Amsaleg, L., et al.: Estimating local intrinsic dimensionality. In: SIGKDD, pp. 29–38 (2015)
Karger, D.R., Ruhl, M.: Finding nearest neighbors in growth-restricted metrics. In: Proceedings of the Thirty-Fourth Annual ACM STOC, pp. 741–750 (2002)
Houle, M.E., Kashima, H., Nett, M.: Generalized expansion dimension. In: ICDMW, pp. 587–594 (2012)
Houle, M.E.: Dimensionality, discriminability, density and distance distributions. In: ICDMW, pp. 468–473 (2013)
Houle, M.E.: Local intrinsic dimensionality I: an extreme-value-theoretic foundation for similarity applications. In: Beecks, C., Borutta, F., Kröger, P., Seidl, T. (eds.) SISAP 2017. LNCS, vol. 10609, pp. 64–79. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-319-68474-1_5
Houle, M.E., Ma, X., Nett, M., Oria, V.: Dimensional testing for multi-step similarity search. In: ICDM, pp. 299–308 (2012)
Von Brünken, J., Houle, M., Zimek, A.: Intrinsic dimensional outlier detection in high-dimensional data. NII Technical Reports, pp. 1–12 (2015)
Houle, M.E., Schubert, E., Zimek, A.: On the correlation between local intrinsic dimensionality and outlierness. In: Marchand-Maillet, S., Silva, Y.N., Chávez, E. (eds.) SISAP 2018. LNCS, vol. 11223, pp. 177–191. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02224-2_14
Houle, M.E.: Inlierness, outlierness, hubness and discriminability: an extreme-value-theoretic foundation. NII Technical Reports, pp. 1–32 (2015)
Houle, M.E.: Local intrinsic dimensionality II: multivariate analysis and distributional support. In: Beecks, C., Borutta, F., Kröger, P., Seidl, T. (eds.) SISAP 2017. LNCS, vol. 10609, pp. 80–95. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-319-68474-1_6
Coles, S.G.: An Introduction to Statistical Modeling of Extreme Values, vol. 208. Springer, London (2001). https://doi.org/10.1007/978-1-4471-3675-0
Rousu, D.N.: Weibull skewness and kurtosis as a function of the shape parameter. Technometrics 15(4), 927–930 (1973)
Pearson, K.: Contributions to the mathematical theory of evolution. II. skew variation in homogeneous material. Philos. Trans. R. Soc. Lond. Ser. A 186, 343–414 (1895)
Nelsen, R.B.: An Introduction to Copulas. Springer, New York (2006). https://doi.org/10.1007/0-387-28678-0
Takeuchi, T.: Constructing a bivariate distribution function with given marginals and correlation: application to the galaxy luminosity function. Mon. Not. R. Astron. Soc. 406, 1830–1840 (2010)
Kendall, M.G., Stuart, A., Ord, J.K. (eds.): Kendall’s Advanced Theory of Statistics. Oxford University Press Inc., Oxford (1987)
Kendall, M.G.: Rank and product-moment correlation. Biometrika 36(1/2), 177–193 (1949)
Gionis, A., Mannila, H., Mielikäinen, T., Tsaparas, P.: Assessing data mining results via swap randomization. ACM TKDD 1(3), 14 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Hashem, T., Rashidi, L., Bailey, J., Kulik, L. (2019). Characteristics of Local Intrinsic Dimensionality (LID) in Subspaces: Local Neighbourhood Analysis. In: Amato, G., Gennaro, C., Oria, V., Radovanović , M. (eds) Similarity Search and Applications. SISAP 2019. Lecture Notes in Computer Science(), vol 11807. Springer, Cham. https://doi.org/10.1007/978-3-030-32047-8_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-32047-8_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32046-1
Online ISBN: 978-3-030-32047-8
eBook Packages: Computer ScienceComputer Science (R0)