Skip to main content

Boosting Correlation Based Penalization in Generalized Linear Models

  • Chapter
Recent Advances in Linear Models and Related Areas
  • 1487 Accesses

Linear models have a long tradition in statistics as nicely summarized in Rao, Toutenburg, Shalabh, Heumann (2008). When the number of covariates is large the estimation of unknown parameters frequently raises problems. Then the interest usually focusses on data driven subset selection of relevant regressors. The sophisticated monitoring equipment which is now routinely used in many data collection processes makes it possible to collect data with a huge amount of regressors, even with considerably more explanatory variables than observations. One example is the analysis of microarray data of gene expressions. Here the typical tasks are to select variables and to classify samples into two or more alternative categories. Binary responses of this type may be handled within the framework of generalized linear models (Neider and Wedderburn (1972)) and are also considered in Rao, Toutenburg, Shalabh, Heumann (2008).

In this paper we propose a new regularization method and a boosted version of it, which explicitly focus on the selection of groups. To reach this target we consider a correlation based penalty which uses correlation between variables as data driven weights for penalization. See also Tutz and Ulbricht (2006) for a similar approach to linear models. This new method and some of its main properties are described in Section 2. A boosted version of it that will be presented in Section 3 allows for variable selection. In Section 4 we use simulated and real data sets to compare our new methods with existing ones.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Anderson JA, Blair V (1982) Penalized maximum likelihood estimation in logistic regression and discrimination. Biometrika 69:123-136

    Article  MATH  MathSciNet  Google Scholar 

  • Breiman L (1998) Arcing classifiers. Annals of Statistics 26:801-849

    Article  MATH  MathSciNet  Google Scholar 

  • Bühlmann P, Yu B (2003) Boosting with the L2 loss: Regression and classification. Journal of the American Statistical Association 98:324-339

    Article  MATH  MathSciNet  Google Scholar 

  • Duffy DE, Santner TJ (1989) On the small sample properties of restricted maximum likelihood estimators for logistic regression models. Communication in Statistics, Theory & Methods 18:959-989

    Article  MATH  MathSciNet  Google Scholar 

  • Fahrmeir L, Kaufmann H (1985) Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models. The Annals of Statistics 13:342-368

    Article  MATH  MathSciNet  Google Scholar 

  • Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Annals of Statistics 29:1189-1232

    Article  MATH  MathSciNet  Google Scholar 

  • Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531-537

    Article  Google Scholar 

  • Hoerl AE, Kennard RW (1970) Ridge regression: Bias estimation for nonorthogonal problems. Technometrics 12:55-67

    Article  MATH  Google Scholar 

  • Meir R, Rätsch G (2003) An introduction to boosting and leveraging. In: Mendelson S, Smola A (eds) Advanced Lectures on Machine Learning, Springer, New York, pp 119-184

    Google Scholar 

  • Nelder JA, Wedderburn RWM (1972) Generalized linear models. Journal of the Royal Statistical Society A 135:370-384

    Article  Google Scholar 

  • Nyquist H (1991) Restricted estimation of generalized linear models. Applied Statistics 40:133-141

    Article  MATH  Google Scholar 

  • Park MY, Hastie T (2007) An l1 regularization-path algorithm for generalized linear models. JRSS

    Google Scholar 

  • Schaefer RL, Roi LD, Wolfe RA (1984) A ridge logistic estimate. Com-munication in Statistics, Theory & Methods 13:99-113

    Article  Google Scholar 

  • Segerstedt B (1992) On ordinary ridge regression in generalized linear models. Communication in Statistics, Theory & Methods 21:2227-2246

    Article  MATH  MathSciNet  Google Scholar 

  • Shevade SK, Keerthi SS (2003) A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics 19:2246-2253

    Article  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B 58:267-288

    MATH  MathSciNet  Google Scholar 

  • Rao CR, Toutenburg H, Shalabh, Heumann C (2008) Linear Models - Least Squares and Generalizations (3rd edition). Springer, Berlin Heidelberg New York

    Google Scholar 

  • Trenkler G, Toutenburg H (1990) Mean squared error matrix comparisons between biased estimators - an overview of recent results. Statistical Papers 31:165-179

    Article  MATH  MathSciNet  Google Scholar 

  • Tutz G, Binder H (2007) Boosting ridge regression. Computational Statistics & Data Analysis (Appearing)

    Google Scholar 

  • Tutz G, Leitenstorfer F (2007) Generalized smooth monotonic regression in additive modeling. Journal of Computational and Graphical Statistics 16:165-188

    Article  MathSciNet  Google Scholar 

  • Tutz G, Ulbricht J (2006) Penalized regression with correlation based penalty. Discussion Paper 486, SFB 386, Universität München

    Google Scholar 

  • Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society B 67:301-320

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Physica-Verlag Heidelberg

About this chapter

Cite this chapter

Ulbricht, J., Tutz, G. (2008). Boosting Correlation Based Penalization in Generalized Linear Models. In: Recent Advances in Linear Models and Related Areas. Physica-Verlag HD. https://doi.org/10.1007/978-3-7908-2064-5_9

Download citation

Publish with us

Policies and ethics