Advertisement

Harmonic Analysis of Digital Data Bases

  • Ronald R. Coifman
  • Matan Gavish
Chapter
Part of the Applied and Numerical Harmonic Analysis book series (ANHA)

Abstract

Digital databases can be represented by matrices, where rows (say) correspond to numerical sensors readings, or features, and columns correspond to data points. Recent data analysis methods describe the local geometry of the data points using a weighted affinity graph, whose vertices correspond to data points. We consider two geometries, or graphs – one on the rows and one on the columns, such that the data matrix is smooth with respect to the “tensor product” of the two geometries. This is achieved by an iterative procedure that constructs a multiscale partition tree on each graph. We use the recently introduced notion of Haar-like bases induced by the trees to obtain Tensor-Haar-like bases for the space of matrices, and show that an p entropy conditions on the expansion coefficients of the database, viewed as a function on the product of the geometries, imply both smoothness and efficient reconstruction. We apply this methodology to analyze, de-noise and compress a term-document database. We use the same methodology to compress matrices of potential operators of unknown charge distribution geometries and to organize Laplacian eigenvectors, where the data matrix is the “expansion in Laplace eigenvectors” operator.

Keywords

Data Matrix Iterative Procedure Potential Operator Fast Multipole Method Partition Tree 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgements

We are indebted to our collaborator Boaz Nadler for many insights regarding Haar-like bases. We also thank Rob Tibshirani and Fred Warner for their helpful comments. M.G. is supported by a William R. and Sara Hart Kimball Stanford Graduate Fellowship and is grateful for the hospitality of the Yale Applied Math program during the preparation of this work.

References

  1. 1.
    Allen, G.I., Tibshirani, R.: Transposable regularized covariance models with an application to missing data imputation. To appear in Annals of Applied Statistics (2010)Google Scholar
  2. 2.
    Belkin, M., Niyogi, P., Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 13, 1373–1397 (2003)CrossRefGoogle Scholar
  3. 3.
    Beylkin, G., Coifman, R., Rokhlin, V., Wavelets in numerical analysis. In: Ruskai, M.B., Beylkin, G., Coifman, R. (eds.) Wavelets and their applications, pp. 181–210. Jones and Bartlett, Boston (1992)Google Scholar
  4. 4.
    Candès, E.J., Tao, T., Near-optimal signal recovery from random projections: Universal encoding strategies?. IEEE Trans. Inform. Theory, 52(12), 5406–5425 (2006)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Coifman, R.R., Donoho D.L.: Translation invariant de-noising. In: Antoniadis, A., Oppenheim, G. (eds.) Wavelets and Statistics, pp. 125–150. Springer, New York (1995)Google Scholar
  6. 6.
    Coifman, R.R., Lafon, S.: Diffusion maps. Appl. Comput. Harmon. Anal. 21(1), 5–30 (2006)MathSciNetMATHCrossRefGoogle Scholar
  7. 7.
    Coifman, R.R, Maggioni, M.: Diffusion wavelets. Appl. Comput. Harmon. Anal. 21(1), 54–95 (2006)Google Scholar
  8. 8.
    Coifman, R.R., Rochberg, R.: Another characterization of B.M.O. Proc. Amer. Math. Soc. 79, 249–254 (1980)Google Scholar
  9. 9.
    Coifman, R.R., Weiss, G.: Analyse Harmonique Noncommutative sur Certains Espaces Homogenes. Springer-Verlag (1971)Google Scholar
  10. 10.
    Coifman, R.R., Weiss, G.: Extensions of Hardy spaces and their use in analysis. Bul. Of the A.M.S., 83(4), 569–645 (1977)Google Scholar
  11. 11.
    Donoho, D.L.: Compressed Sensing. IEEE Trans. Inform. Theory 52(4), 1289–1306 (2006)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Gavish, M., Nadler, B., Coifman, R.R.: Multiscale wavelets on trees, graphs and high dimensional data: Theory and applications to semi supervised learning. Proceedings of the 27th International Conference on Machine Learning, ICML (2010)Google Scholar
  13. 13.
    Gavish, M., Nadler, B., Coifman, R.R.: Inference by Haar-like wavelet analysis. preprint (2010)Google Scholar
  14. 14.
    Greengard, L., Rokhlin, V.: A fast algorithm for particle simulations, J. Comput. Phys. 73, 325–348 (1987)MathSciNetMATHCrossRefGoogle Scholar
  15. 15.
    Lazzeroni, L., and Owen, A., Plaid models for gene expression data. Statistica Sinica 12(1), 61–86 (2002)MathSciNetMATHGoogle Scholar
  16. 16.
    Martinsson, P., Tygert, M., Multilevel Compression of Linear Operators: Descendants of Fast Multipole Methods and Calderón-Zygmund Theory. Lecture notes, Yale University and Courant Institute (2009) Available at http://cims.nyu.edu/~tygert/gradcourse/survey.pdf. Cited 30 May 2010
  17. 17.
    Priebe, C.E., Marchette, D.J., Park, Y., Wegman, E.J., Solka, J.L., Socolinsky, D.A., Karakos, D., Church, K.W., Guglielmi, R., Coifman, R.R., Link, D., Healy, D.M., Jacobs, M.Q., Tsao, A.: Iterative denoising for cross-corpus discovery. Proceedings of COMPSTAT 2004, Physica-Verlag/Springer (2004)Google Scholar
  18. 18.
    Singh, A., Nowak, R., Calderbank, R.: Detecting weak but hierarchically-structured patterns in networks. Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, AISTATS (2010)Google Scholar
  19. 19.
    Stein, E.M., Weiss, G.: Fourier analysis on Euclidean spaces. Princeton University Press, Princeton (1971)MATHGoogle Scholar
  20. 20.
    Strömberg, J.O.: Wavelets in higher dimensions. Documenta Mathematica Extra Volume ICM-1998(3), 523–532 (1998)Google Scholar
  21. 21.
    Tukey, J.W.: The future of data analysis, Ann. Math. Statist. 33(1), 1–67 (1962)MathSciNetMATHCrossRefGoogle Scholar
  22. 22.
    Wallmann, D.M.: Multiscale diffusion coordinate refinement. Ph.D thesis, Yale University (2009)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Department of StatisticsStanford UniversityStanfordUSA
  2. 2.Program in Applied MathematicsYale UniversityNew HavenUSA

Personalised recommendations