Abstract
For dimension reduction in l 1, one can multiply a data matrix A ∈ ℝn×D by R ∈ ℝD×k (k ≪ D) whose entries are i.i.d. samples of Cauchy. The impossibility result says one can not recover the pairwise l 1 distances in A from B = AR ∈ ℝn×k, using linear estimators. However, nonlinear estimators are still useful for certain applications in data stream computations, information retrieval, learning, and data mining.
We propose three types of nonlinear estimators: the bias-corrected sample median estimator, the bias-corrected geometric mean estimator, and the bias-corrected maximum likelihood estimator. We derive tail bounds for the geometric mean estimator and establish that \(k = O\left(\frac{\log n}{\epsilon^2}\right)\) suffices with the constants explicitly given. Asymptotically (as k→ ∞), both the sample median estimator and the geometric mean estimator are about 80% efficient compared to the maximum likelihood estimator (MLE). We analyze the moments of the MLE and propose approximating the distribution of the MLE by an inverse Gaussian.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Zhu, J., Rosset, S., Hastie, T., Tibshirani, R.: 1-norm support vector machines. In: NIPS, Vancouver, BC, Canada (2003)
Chapelle, O., Haffner, P., Vapnik, V.N.: Support vector machines for histogram-based image classification. IEEE Trans. Neural Networks 10, 1055–1064 (1999)
Indyk, P.: Stable distributions, pseudorandom generators, embeddings, and data stream computation. Journal of ACM 53, 307–323 (2006)
Li, P.: Very sparse stable random projections, estimators and tail bounds for stable random projections. Technical report, http://arxiv.org/PS_cache/cs/pdf/0611/0611114.pdf (2006)
Zolotarev, V.M.: One-dimensional Stable Distributions. American Mathematical Society, Providence, RI (1986)
Vempala, S.: The Random Projection Method. American Mathematical Society, Providence, RI (2004)
Johnson, W.B., Lindenstrauss, J.: Extensions of Lipschitz mapping into Hilbert space. Contemporary Mathematics 26, 189–206 (1984)
Lee, J.R., Naor, A.: Embedding the diamond graph in l p and dimension reduction in l 1. Geometric And. Functional Analysis 14, 745–747 (2004)
Brinkman, B., Charikar, M.: On the impossibility of dimension reduction in l 1. Journal of ACM 52, 766–788 (2005)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: PODS, Madison, WI, pp. 1–16 (2002)
Li, P., Church, K.W.: Using sketches to estimate associations. In: HLT/EMNLP, Vancouver, BC, Canada, pp. 708–715 ( (2005)
Li, P., Church, K.W., Hastie, T.J.: Conditional random sampling: A sketch-based sampling technique for sparse data. In: NIPS, Vancouver, BC, Canada (2007)
Li, P., Church, K.W.: A sketch algorithm for estimating two-way and multi-way associations. Computational Linguistics, To Appear (2007)
Achlioptas, D.: Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Journal of Computer and System Sciences 66, 671–687 (2003)
Li, P., Hastie, T.J., Church, K.W.: Improving random projections using marginal information. In: COLT, Pittsburgh, PA, pp. 635–649 (2006)
Arriaga, R., Vempala, S.: An algorithmic theory of learning: Robust concepts and random projection. Machine Learning 63, 161–182 (2006)
Fama, E.F., Roll, R.: Parameter estimates for symmetric stable distributions. Journal of the American Statistical Association 66, 331–338 (1971)
Gradshteyn, I.S., Ryzhik, I.M.: Table of Integrals, Series, and Products, 5th edn. Academic Press, London (1994)
Li, P., Paul, D., Narasimhan, R., Cioffi, J.: On the distribution of SINR for the MMSE MIMO receiver and performance analysis. IEEE Trans. Inform. Theory 52, 271–286 (2006)
Seshadri, V.: The Inverse Gaussian Distribution: A Case Study in Exponential Families. Oxford University Press, New York (1993)
Philips, T.K., Nelson, R.: The moment bound is tighter than Chernoff’s bound for positive tail probabilities. The American Statistician 49, 175–178 (1995)
Lugosi, G.: Concentration-of-measure inequalities. Lecture Notes (2004)
Shenton, L.R., Bowman, K.: Higher moments of a maximum-likelihood estimate. Journal of Royal Statistical Society B 25, 305–317 (1963)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Li, P., Hastie, T.J., Church, K.W. (2007). Nonlinear Estimators and Tail Bounds for Dimension Reduction in l 1 Using Cauchy Random Projections. In: Bshouty, N.H., Gentile, C. (eds) Learning Theory. COLT 2007. Lecture Notes in Computer Science(), vol 4539. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72927-3_37
Download citation
DOI: https://doi.org/10.1007/978-3-540-72927-3_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72925-9
Online ISBN: 978-3-540-72927-3
eBook Packages: Computer ScienceComputer Science (R0)