Local Intrinsic Dimensionality Based Features for Clustering

  • Paola Campadelli
  • Elena Casiraghi
  • Claudio Ceruti
  • Gabriele Lombardi
  • Alessandro Rozza
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8156)


One of the fundamental tasks of unsupervised learning is dataset clustering, to partition the input dataset into clusters composed by somehow “similar” objects that “differ” from the objects belonging to other classes. To this end, in this paper we assume that the different clusters are drawn from different, possibly intersecting, geometrical structures represented by manifolds embedded into a possibly higher dimensional space. Under these assumptions, and considering that each manifold is typified by a geometrical structure characterized by its intrinsic dimensionality, which (possibly) differs from the intrinsic dimensionalities of other manifolds, we code the input data by means of local intrinsic dimensionality estimates and features related to them, and we subsequently apply simple and basic clustering algorithms, since our interest is specifically aimed at assessing the discriminative power of the proposed features. Indeed, their encouraging discriminative quality is shown by a feature relevance test, by the clustering results achieved on both synthetic and real datasets, and by their comparison to those obtained by related and classical state-of-the-art clustering approaches.


Local features Intrinsic dimensionality Dataset clustering Multi-manifold structures 


  1. 1.
    Bennett, R.S.: The Intrinsic Dimensionality of Signal Collections. IEEE Trans. on Information Theory IT-15(5), 517–525 (1969)CrossRefGoogle Scholar
  2. 2.
    Bishop, C.M.: Bayesian PCA. In: Proc. of NIPS 11, pp. 382–388 (1998)Google Scholar
  3. 3.
    Carter, K.M., Raich, R., Hero, A.O.: On local intrinsic dimension estimation and its applications. IEEE Trans. on Signal Processing 58(2), 650–663 (2010)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Ceruti, C., Bassis, S., Rozza, A., Lombardi, G., Casiraghi, E., Campadelli, P.: DANCo: Dimensionality from Angle and Norm Concentration. ArXiv e-prints (June 2012)Google Scholar
  5. 5.
    Ceruti, C., Rozza, A., Bassis, S., Lombardi, G., Casiraghi, E., Campadelli, P.: DANCo: an intrinsic Dimensionalty estimator exploiting Angle and Norm Concentration. Submitted to Pattern Recognition Letters (2013)Google Scholar
  6. 6.
    Costa, J.A., Hero, A.O.: Learning intrinsic dimension and entropy of high-dimensional shape spaces. In: Proc. of EUSIPCO, pp. 231–252 (2004)Google Scholar
  7. 7.
    Jiang, D., Tang, C., Zhang, A.: Cluster analysis for gene expression data: A survey. IEEE Trans. Knowl. Data Eng. 16(11), 1370–1386 (2004)CrossRefGoogle Scholar
  8. 8.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood for incomplete data via the EM algorithm. J. of the Royal Statistical Soc.: Series B 39(1), 1–38 (1977)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Dong, J., Krzyzak, A., Suen, C.: Fast svm training algorithm with decomposition on very large data sets. IEEE Trans. on PAMI 27(4), 603–618 (2005)CrossRefGoogle Scholar
  10. 10.
    Frank, A., Asuncion, A.: UCI machine learning repository (2010),
  11. 11.
    Goldberg, A.B., Zhu, X., Singh, A., Xu, Z., Nowak, R.: Multi-manifold semi-supervised learning – learning when data lives on multiple, intersecting manifolds. In: Proc. of 12th International Conference on Artificial Intelligence and Statistics (2009)Google Scholar
  12. 12.
    Gong, D., Zhao, X., Medioni, G.: Robust multiple manifolds structure learning. In: Proceedings of the 29th International Conference on Machine Learning, Edinburgh, Scotland, UK (2012)Google Scholar
  13. 13.
    Hall, M., Eibe, F., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The weka data mining software: An update. SIGKDD Explorations 11 (2009)Google Scholar
  14. 14.
    Hein, M., Audibert, J.: Intrinsic dimensionality estimation of submanifolds in Rd. In: Proceedings of the ICML, pp. 289–296. ACM (2005)Google Scholar
  15. 15.
    Horton, P., Nakai, K.: A probablistic classification system for predicting the cellular localization sites of proteins. In: Intelligent Systems in Molecular Biology, pp. 109–115 (1996)Google Scholar
  16. 16.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. of IEEE 86, 2278–2324 (1998)CrossRefGoogle Scholar
  17. 17.
    Levina, E., Bickel, P.J.: Maximum likelihood estimation of intrinsic dimension. In: Proc. of NIPS, vol. 17(1), pp. 777–784 (2005)Google Scholar
  18. 18.
    Madeira, S., Oliveira, A.: Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Trans. Comp. Biol. Bioinfor. 1(1), 24–45 (2004)CrossRefGoogle Scholar
  19. 19.
    Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Physical Review E 76 (2007)Google Scholar
  20. 20.
    Rozza, A., Lombardi, G., Ceruti, C., Casiraghi, E., Campadelli, P.: Novel high intrinsic dimensionality estimators. Machine Learning Journal (May 2012)Google Scholar
  21. 21.
    Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. Tech. Rep. 00034 (2000)Google Scholar
  22. 22.
    Upton, G.J.G.: Approximate confidence intervals for the mean direction of a von Mises distribution. Biometrika 73(2), 525–527 (1986)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Wang, Y., Jiang, Y., Wu, Y., Zhou, Z.: Local and structural consistency for multi-manifold clustering. In: Proceedings of IJCAI 2011. AAAI Press (2011)Google Scholar
  24. 24.
    Xiao, Y., Yu, J., Gong, S.: Intrinsic dimension induced similarity measure for clustering. In: Tang, J., King, I., Chen, L., Wang, J. (eds.) ADMA 2011, Part II. LNCS, vol. 7121, pp. 110–123. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  25. 25.
    Xu, R., Wunsch, D.: Survey of Clustering Algorithms. IEEE Trans. on Neural Networks 16(3), 645–678 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Paola Campadelli
    • 1
  • Elena Casiraghi
    • 1
  • Claudio Ceruti
    • 2
  • Gabriele Lombardi
    • 1
  • Alessandro Rozza
    • 3
  1. 1.Dipartimento di InformaticaUniversità degli Studi di MilanoMilanoItaly
  2. 2.Dipartimento di MatematicaUniversità degli Studi di MilanoMilanoItaly
  3. 3.Dipartimento di Scienze ApplicateUniversità degli Studi di Napoli Parthenope Centro Direzionale di NapoliNapoliItaly

Personalised recommendations