Dimensionality of Random Subspaces

Gatnar, Eugeniusz

doi:10.1007/3-540-28084-7_12

Eugeniusz Gatnar²¹

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2282 Accesses
1 Citations

Abstract

Significant improvement of classification accuracy can be obtained by aggregation of multiple models. Proposed methods in this field are mostly based on sampling cases from the training set, or changing weights for cases. Reduction of classification error can also be achieved by random selection of variables to the training subsamples or directly to the model. In this paper we propose a method of feature selection for ensembles that significantly reduces the dimensionality of the subspaces.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

AMIT, Y. and GEMAN, G. (2001): Multiple Randomized Classifiers: MRCL. Technical Report, Department of Statistics, University of Chicago, Chicago.
Google Scholar
BAUER, E. and KOHAVI R. (1999): An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants. Machine Learning, 36, 105–142.
Article Google Scholar
BLAKE, C., KEOGH, E. and MERZ, C. J. (1998): UCI Repository of Machine Learning Databases. Department of Information and Computer Science, University of California, Irvine.
Google Scholar
BREIMAN, L. (1996): Bagging predictors. Machine Learning, 24, 123–140.
MATH MathSciNet Google Scholar
BREIMAN, L. (1998): Arcing classifiers. Annals of Statistics, 26, 801–849.
Article MATH MathSciNet Google Scholar
BREIMAN, L. (1999): Using adaptive bagging to debias regressions. Technical Report 547, Department of Statistics, University of California, Berkeley.
Google Scholar
BREIMAN, L. (2001): Random Forests. Machine Learning 45, 5–32.
Article MATH Google Scholar
DIETTERICH, T. (2000): An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting and Randomization. Machine Learning, 40, 139–158.
Article Google Scholar
DIETTERICH, T. and BAKIRI, G. (1995): Solving multiclass learning problem via error-correcting output codes. Journal of Artificial Intelligence Research, 2, 263–286.
Google Scholar
FAYYAD, U.M. and IRANI, K.B. (1993): Multi-interval discretisation of continuous-valued attributes. In: Proceedings of the XIII International Joint Conference on Artificial Intelligence, Morgan Kaufmann, San Francisco, 1022–1027.
Google Scholar
FREUND, Y. and SCHAPIRE, R.E. (1997): A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences, 55, 119–139.
Article MathSciNet Google Scholar
HALL, M. (2000): Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the 17th International Conference on Machine Learning, Morgan Kaufmann, San Francisco.
Google Scholar
HELLWIG, Z. (1969): On the problem of optimal selection of predictors. Statistical Revue, 3–4 (in Polish).
Google Scholar
HO, T.K. (1998): The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 832–844.
Article Google Scholar
HONG, S.J. (1997): Use of contextual information for feature ranking and discretization. IEEE Transactions on Knowledge and Data Engineering, 9, 718–730.
Article Google Scholar
KIRA, A. and RENDELL, L. (1992): A practical approach to feature selection. In: D. Sleeman and P. Edwards (Eds.): Proceedings of the 9th International Conference on Machine Learning, Morgan Kaufmann, San Francisco, 249–256.
Google Scholar
KOHAVI, R. and JOHN, G.H. (1997): Wrappers for feature subset selection. Artificial Intelligence, 97, 273–324.
Article Google Scholar
OZA, N.C. and TUMAR, K. (1999): Dimensionality reduction thrrough classifier ensembles. Technical Report NASA-ARC-IC-1999-126, Computational Sciences Division, NASA Ames Research Center.
Google Scholar
PRESS, W.H., FLANNERY, B.P., TEUKOLSKY, S.A., VETTERLING, W.T. (1989): Numerical recipes in Pascal. Cambridge University Press, Cambridge.
Google Scholar
QUINLAN, J.R. (1993): C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo.
Google Scholar
THERNEAU, T.M. and ATKINSON, E.J. (1997): An introduction to recursive partitioning using the RPART routines, Mayo Foundation, Rochester.
Google Scholar
WOLPERT, D. (1992): Stacked generalization. Neural Networks, 5, 241–259.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Statistics, Katowice University of Economics, ul. Bogucicka 14, 40-226, Katowice, Poland
Eugeniusz Gatnar

Authors

Eugeniusz Gatnar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Fachbereich Statistik, Universität Dortmund, 44221, Dortmund
Claus Weihs
Institut für Entscheidungstheorie und Unternehmensforschung, Universität Karlsruhe (TH), 76128, Karlsruhe
Wolfgang Gaul

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gatnar, E. (2005). Dimensionality of Random Subspaces. In: Weihs, C., Gaul, W. (eds) Classification — the Ubiquitous Challenge. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-28084-7_12

Download citation

DOI: https://doi.org/10.1007/3-540-28084-7_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25677-9
Online ISBN: 978-3-540-28084-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics