Harmonic Analysis of Digital Data Bases
Digital databases can be represented by matrices, where rows (say) correspond to numerical sensors readings, or features, and columns correspond to data points. Recent data analysis methods describe the local geometry of the data points using a weighted affinity graph, whose vertices correspond to data points. We consider two geometries, or graphs – one on the rows and one on the columns, such that the data matrix is smooth with respect to the “tensor product” of the two geometries. This is achieved by an iterative procedure that constructs a multiscale partition tree on each graph. We use the recently introduced notion of Haar-like bases induced by the trees to obtain Tensor-Haar-like bases for the space of matrices, and show that an ℓ p entropy conditions on the expansion coefficients of the database, viewed as a function on the product of the geometries, imply both smoothness and efficient reconstruction. We apply this methodology to analyze, de-noise and compress a term-document database. We use the same methodology to compress matrices of potential operators of unknown charge distribution geometries and to organize Laplacian eigenvectors, where the data matrix is the “expansion in Laplace eigenvectors” operator.
KeywordsData Matrix Iterative Procedure Potential Operator Fast Multipole Method Partition Tree
Unable to display preview. Download preview PDF.
We are indebted to our collaborator Boaz Nadler for many insights regarding Haar-like bases. We also thank Rob Tibshirani and Fred Warner for their helpful comments. M.G. is supported by a William R. and Sara Hart Kimball Stanford Graduate Fellowship and is grateful for the hospitality of the Yale Applied Math program during the preparation of this work.
- 1.Allen, G.I., Tibshirani, R.: Transposable regularized covariance models with an application to missing data imputation. To appear in Annals of Applied Statistics (2010)Google Scholar
- 3.Beylkin, G., Coifman, R., Rokhlin, V., Wavelets in numerical analysis. In: Ruskai, M.B., Beylkin, G., Coifman, R. (eds.) Wavelets and their applications, pp. 181–210. Jones and Bartlett, Boston (1992)Google Scholar
- 5.Coifman, R.R., Donoho D.L.: Translation invariant de-noising. In: Antoniadis, A., Oppenheim, G. (eds.) Wavelets and Statistics, pp. 125–150. Springer, New York (1995)Google Scholar
- 7.Coifman, R.R, Maggioni, M.: Diffusion wavelets. Appl. Comput. Harmon. Anal. 21(1), 54–95 (2006)Google Scholar
- 8.Coifman, R.R., Rochberg, R.: Another characterization of B.M.O. Proc. Amer. Math. Soc. 79, 249–254 (1980)Google Scholar
- 9.Coifman, R.R., Weiss, G.: Analyse Harmonique Noncommutative sur Certains Espaces Homogenes. Springer-Verlag (1971)Google Scholar
- 10.Coifman, R.R., Weiss, G.: Extensions of Hardy spaces and their use in analysis. Bul. Of the A.M.S., 83(4), 569–645 (1977)Google Scholar
- 12.Gavish, M., Nadler, B., Coifman, R.R.: Multiscale wavelets on trees, graphs and high dimensional data: Theory and applications to semi supervised learning. Proceedings of the 27th International Conference on Machine Learning, ICML (2010)Google Scholar
- 13.Gavish, M., Nadler, B., Coifman, R.R.: Inference by Haar-like wavelet analysis. preprint (2010)Google Scholar
- 16.Martinsson, P., Tygert, M., Multilevel Compression of Linear Operators: Descendants of Fast Multipole Methods and Calderón-Zygmund Theory. Lecture notes, Yale University and Courant Institute (2009) Available at http://cims.nyu.edu/~tygert/gradcourse/survey.pdf. Cited 30 May 2010
- 17.Priebe, C.E., Marchette, D.J., Park, Y., Wegman, E.J., Solka, J.L., Socolinsky, D.A., Karakos, D., Church, K.W., Guglielmi, R., Coifman, R.R., Link, D., Healy, D.M., Jacobs, M.Q., Tsao, A.: Iterative denoising for cross-corpus discovery. Proceedings of COMPSTAT 2004, Physica-Verlag/Springer (2004)Google Scholar
- 18.Singh, A., Nowak, R., Calderbank, R.: Detecting weak but hierarchically-structured patterns in networks. Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, AISTATS (2010)Google Scholar
- 20.Strömberg, J.O.: Wavelets in higher dimensions. Documenta Mathematica Extra Volume ICM-1998(3), 523–532 (1998)Google Scholar
- 22.Wallmann, D.M.: Multiscale diffusion coordinate refinement. Ph.D thesis, Yale University (2009)Google Scholar