Combining Feature Selection and Feature Construction to Improve Concept Learning for High Dimensional Data

Hanczar, Blaise

doi:10.1007/11527862_19

Blaise Hanczar²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3607))

Included in the following conference series:

International Symposium on Abstraction, Reformulation, and Approximation

1024 Accesses

Abstract

This paper describes and experimentally analyses a new dimension reduction method for microarray data. Microarrays, which allow simultaneous measurement of the level of expression of thousands of genes in a given situation (tissue, cell or time), produce data which poses particular machine-learning problems. The disproportion between the number of attributes (tens of thousands) and the number of examples (hundreds) requires a reduction in dimension. While gene/class mutual information is often used to filter the genes we propose an approach which takes into account gene-pair/class information. A gene selection heuristic based on this principle is proposed as well as an automatic feature-construction procedure forcing the learning algorithms to make use of these gene pairs. We report significant improvements in accuracy on several public microarray databases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bellman, R.: Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton (1961)
MATH Google Scholar
Ben-Dor, A., Friedman, N., Yakhini, Z.: Scoring genes for relevance. Technical Report AGL-2000-13, Agilent Technologies (2000)
Google Scholar
Bo, T., Jonassen, I.: New feature subset selection procedures for classification of expression profiles. Genome Biology (2002)
Google Scholar
Braga-Neto, U.M., Dougherty, E.: Is cross-validation valid for small-sample microarray classification? Bioinformatics 20(3), 374–380 (2004)
Article Google Scholar
Cakmakov, D., Bennani, Y.: Feature selection for pattern recognition (2002)
Google Scholar
Clément, K.: Monogenic forms of obesity: From mice to human. Ann. Endocrinol. (2000)
Google Scholar
Dudoit, S., Fridlyand, J., Speed, P.: Comparison of discrimination methods for classification of tumors using gene expression data. Journal of American Statististial Association 97, 77–87 (2002)
Article MATH MathSciNet Google Scholar
Efron, B.: Estimating the error rate of a prediction rule: Improvement on cross-validation. Journal of American Statistical Association 78, 316–331 (1983)
Article MATH MathSciNet Google Scholar
Geman, D., D’Avignon, C., Naiman, D., Winslow, R., Zeboulon, A.: Gene expression comparisons for class prediction in cancer studies. In: Proceedings 36’th Symposium on the Interface: Computing Science and Statistics (2004)
Google Scholar
Hanczar, B., Courtine, M., Benis, A., Henegar, C., Clément, K., Zucker, J.D.: Improving classification of microarray data using prototype-based feature selection. SIGKDD Explorations 5, 23–30 (2003)
Article Google Scholar
Hwang, K.B., Cho, D.Y., Park, S.W., Kim, S.D., Zhang, B.T.: Applying machine learning techniques to analysis of gene expression data: Cancer diagnosis. In: Methods of Microarray Data Analysis (Proceedings of CAMDA 2000), pp. 167–182. Kluwer Academic Publichers, Dordrecht (2002)
Google Scholar
Inza, I., Sierra, B., Blanco, R., Larrañaga, P.: Gene selection by sequential wrapper approaches in microarray cancer class prediction. Journal of Intelligent and Fuzzy Systems, 25–34 (2002)
Google Scholar
Jakulin, A., Bratko, I.: Analyzing attribute dependencies. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 229–240. Springer, Heidelberg (2003)
Chapter Google Scholar
Lee, J.W., Lee, J.B., Park, M., Song, S.H.: An extensive comparison of recent classification tools applied to microarray data. Computational Statistics and Data Analysis (in press)
Google Scholar
Li, L., Darden, T.A., Weinberg, C.R., Levine, A.J., Pedersen, L.G.: Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method. Combinatorial Chemistry and High Throughput Screening, 727–739 (2001)
Google Scholar
Qi, H.: Feature selection and knn fusion in molecular classification of multiple tumor types. In: International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences, METMBS 2002 (2002)
Google Scholar
Wu, X., Ye, Y., Zhang, L.: Graphical modeling based gene interaction analysis for microarray data. SIGKDD Exploration 5, 91–100 (2003)
Article Google Scholar
Xing, E.P., Jordan, M.I., Karp, R.M.: Feature selection for high-dimensional genomic microarray data. In: Proceedings of the Eighteenth International Conference in Machine Learning, ICML 2001 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Lim&Bio, University Paris 13, Bobigny, France
Blaise Hanczar

Authors

Blaise Hanczar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

UR 079 GEODES, IRD, 32 avenue Henri Varagnat, 93143, Bondy, France
Jean-Daniel Zucker
Dip. di Informatica, Università del Piemonte Orientale, Via Bellini 25/G, 15100, Alessandria, Italy
Lorenza Saitta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hanczar, B. (2005). Combining Feature Selection and Feature Construction to Improve Concept Learning for High Dimensional Data. In: Zucker, JD., Saitta, L. (eds) Abstraction, Reformulation and Approximation. SARA 2005. Lecture Notes in Computer Science(), vol 3607. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11527862_19

Download citation

DOI: https://doi.org/10.1007/11527862_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27872-6
Online ISBN: 978-3-540-31882-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics