Nonlinear Estimators and Tail Bounds for Dimension Reduction in l 1 Using Cauchy Random Projections

Li, Ping; Hastie, Trevor J.; Church, Kenneth W.

doi:10.1007/978-3-540-72927-3_37

Ping Li¹,
Trevor J. Hastie¹ &
Kenneth W. Church²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4539))

Included in the following conference series:

International Conference on Computational Learning Theory

3276 Accesses
4 Citations

Abstract

For dimension reduction in l ₁, one can multiply a data matrix A ∈ ℝ^n×D by R ∈ ℝ^D×k (k ≪ D) whose entries are i.i.d. samples of Cauchy. The impossibility result says one can not recover the pairwise l ₁ distances in A from B = AR ∈ ℝ^n×k, using linear estimators. However, nonlinear estimators are still useful for certain applications in data stream computations, information retrieval, learning, and data mining.

We propose three types of nonlinear estimators: the bias-corrected sample median estimator, the bias-corrected geometric mean estimator, and the bias-corrected maximum likelihood estimator. We derive tail bounds for the geometric mean estimator and establish that \(k = O\left(\frac{\log n}{\epsilon^2}\right)\) suffices with the constants explicitly given. Asymptotically (as k→ ∞), both the sample median estimator and the geometric mean estimator are about 80% efficient compared to the maximum likelihood estimator (MLE). We analyze the moments of the MLE and propose approximating the distribution of the MLE by an inverse Gaussian.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Zhu, J., Rosset, S., Hastie, T., Tibshirani, R.: 1-norm support vector machines. In: NIPS, Vancouver, BC, Canada (2003)
Google Scholar
Chapelle, O., Haffner, P., Vapnik, V.N.: Support vector machines for histogram-based image classification. IEEE Trans. Neural Networks 10, 1055–1064 (1999)
Article Google Scholar
Indyk, P.: Stable distributions, pseudorandom generators, embeddings, and data stream computation. Journal of ACM 53, 307–323 (2006)
Article MathSciNet Google Scholar
Li, P.: Very sparse stable random projections, estimators and tail bounds for stable random projections. Technical report, http://arxiv.org/PS_cache/cs/pdf/0611/0611114.pdf (2006)
Zolotarev, V.M.: One-dimensional Stable Distributions. American Mathematical Society, Providence, RI (1986)
Google Scholar
Vempala, S.: The Random Projection Method. American Mathematical Society, Providence, RI (2004)
Google Scholar
Johnson, W.B., Lindenstrauss, J.: Extensions of Lipschitz mapping into Hilbert space. Contemporary Mathematics 26, 189–206 (1984)
MATH MathSciNet Google Scholar
Lee, J.R., Naor, A.: Embedding the diamond graph in l _p and dimension reduction in l ₁. Geometric And. Functional Analysis 14, 745–747 (2004)
Article MATH MathSciNet Google Scholar
Brinkman, B., Charikar, M.: On the impossibility of dimension reduction in l ₁. Journal of ACM 52, 766–788 (2005)
Article MathSciNet Google Scholar
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: PODS, Madison, WI, pp. 1–16 (2002)
Google Scholar
Li, P., Church, K.W.: Using sketches to estimate associations. In: HLT/EMNLP, Vancouver, BC, Canada, pp. 708–715 ( (2005)
Google Scholar
Li, P., Church, K.W., Hastie, T.J.: Conditional random sampling: A sketch-based sampling technique for sparse data. In: NIPS, Vancouver, BC, Canada (2007)
Google Scholar
Li, P., Church, K.W.: A sketch algorithm for estimating two-way and multi-way associations. Computational Linguistics, To Appear (2007)
Google Scholar
Achlioptas, D.: Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Journal of Computer and System Sciences 66, 671–687 (2003)
Article MATH MathSciNet Google Scholar
Li, P., Hastie, T.J., Church, K.W.: Improving random projections using marginal information. In: COLT, Pittsburgh, PA, pp. 635–649 (2006)
Google Scholar
Arriaga, R., Vempala, S.: An algorithmic theory of learning: Robust concepts and random projection. Machine Learning 63, 161–182 (2006)
Article MATH Google Scholar
Fama, E.F., Roll, R.: Parameter estimates for symmetric stable distributions. Journal of the American Statistical Association 66, 331–338 (1971)
Article MATH Google Scholar
Gradshteyn, I.S., Ryzhik, I.M.: Table of Integrals, Series, and Products, 5th edn. Academic Press, London (1994)
MATH Google Scholar
Li, P., Paul, D., Narasimhan, R., Cioffi, J.: On the distribution of SINR for the MMSE MIMO receiver and performance analysis. IEEE Trans. Inform. Theory 52, 271–286 (2006)
Article MathSciNet Google Scholar
Seshadri, V.: The Inverse Gaussian Distribution: A Case Study in Exponential Families. Oxford University Press, New York (1993)
Google Scholar
Philips, T.K., Nelson, R.: The moment bound is tighter than Chernoff’s bound for positive tail probabilities. The American Statistician 49, 175–178 (1995)
Article MathSciNet Google Scholar
Lugosi, G.: Concentration-of-measure inequalities. Lecture Notes (2004)
Google Scholar
Shenton, L.R., Bowman, K.: Higher moments of a maximum-likelihood estimate. Journal of Royal Statistical Society B 25, 305–317 (1963)
MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, Stanford University, Stanford CA 94305, USA
Ping Li & Trevor J. Hastie
Microsoft Research, One Microsoft Way, Redmond WA 98052, USA
Kenneth W. Church

Authors

Ping Li
View author publications
You can also search for this author in PubMed Google Scholar
Trevor J. Hastie
View author publications
You can also search for this author in PubMed Google Scholar
Kenneth W. Church
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Nader H. Bshouty Claudio Gentile

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, P., Hastie, T.J., Church, K.W. (2007). Nonlinear Estimators and Tail Bounds for Dimension Reduction in l ₁ Using Cauchy Random Projections. In: Bshouty, N.H., Gentile, C. (eds) Learning Theory. COLT 2007. Lecture Notes in Computer Science(), vol 4539. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72927-3_37

Download citation

DOI: https://doi.org/10.1007/978-3-540-72927-3_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72925-9
Online ISBN: 978-3-540-72927-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics