Skip to main content

Combining Feature Selection and Feature Construction to Improve Concept Learning for High Dimensional Data

  • Conference paper
Abstraction, Reformulation and Approximation (SARA 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3607))

  • 1024 Accesses

Abstract

This paper describes and experimentally analyses a new dimension reduction method for microarray data. Microarrays, which allow simultaneous measurement of the level of expression of thousands of genes in a given situation (tissue, cell or time), produce data which poses particular machine-learning problems. The disproportion between the number of attributes (tens of thousands) and the number of examples (hundreds) requires a reduction in dimension. While gene/class mutual information is often used to filter the genes we propose an approach which takes into account gene-pair/class information. A gene selection heuristic based on this principle is proposed as well as an automatic feature-construction procedure forcing the learning algorithms to make use of these gene pairs. We report significant improvements in accuracy on several public microarray databases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bellman, R.: Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton (1961)

    MATH  Google Scholar 

  2. Ben-Dor, A., Friedman, N., Yakhini, Z.: Scoring genes for relevance. Technical Report AGL-2000-13, Agilent Technologies (2000)

    Google Scholar 

  3. Bo, T., Jonassen, I.: New feature subset selection procedures for classification of expression profiles. Genome Biology (2002)

    Google Scholar 

  4. Braga-Neto, U.M., Dougherty, E.: Is cross-validation valid for small-sample microarray classification? Bioinformatics 20(3), 374–380 (2004)

    Article  Google Scholar 

  5. Cakmakov, D., Bennani, Y.: Feature selection for pattern recognition (2002)

    Google Scholar 

  6. Clément, K.: Monogenic forms of obesity: From mice to human. Ann. Endocrinol. (2000)

    Google Scholar 

  7. Dudoit, S., Fridlyand, J., Speed, P.: Comparison of discrimination methods for classification of tumors using gene expression data. Journal of American Statististial Association 97, 77–87 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  8. Efron, B.: Estimating the error rate of a prediction rule: Improvement on cross-validation. Journal of American Statistical Association 78, 316–331 (1983)

    Article  MATH  MathSciNet  Google Scholar 

  9. Geman, D., D’Avignon, C., Naiman, D., Winslow, R., Zeboulon, A.: Gene expression comparisons for class prediction in cancer studies. In: Proceedings 36’th Symposium on the Interface: Computing Science and Statistics (2004)

    Google Scholar 

  10. Hanczar, B., Courtine, M., Benis, A., Henegar, C., Clément, K., Zucker, J.D.: Improving classification of microarray data using prototype-based feature selection. SIGKDD Explorations 5, 23–30 (2003)

    Article  Google Scholar 

  11. Hwang, K.B., Cho, D.Y., Park, S.W., Kim, S.D., Zhang, B.T.: Applying machine learning techniques to analysis of gene expression data: Cancer diagnosis. In: Methods of Microarray Data Analysis (Proceedings of CAMDA 2000), pp. 167–182. Kluwer Academic Publichers, Dordrecht (2002)

    Google Scholar 

  12. Inza, I., Sierra, B., Blanco, R., Larrañaga, P.: Gene selection by sequential wrapper approaches in microarray cancer class prediction. Journal of Intelligent and Fuzzy Systems, 25–34 (2002)

    Google Scholar 

  13. Jakulin, A., Bratko, I.: Analyzing attribute dependencies. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 229–240. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  14. Lee, J.W., Lee, J.B., Park, M., Song, S.H.: An extensive comparison of recent classification tools applied to microarray data. Computational Statistics and Data Analysis (in press)

    Google Scholar 

  15. Li, L., Darden, T.A., Weinberg, C.R., Levine, A.J., Pedersen, L.G.: Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method. Combinatorial Chemistry and High Throughput Screening, 727–739 (2001)

    Google Scholar 

  16. Qi, H.: Feature selection and knn fusion in molecular classification of multiple tumor types. In: International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences, METMBS 2002 (2002)

    Google Scholar 

  17. Wu, X., Ye, Y., Zhang, L.: Graphical modeling based gene interaction analysis for microarray data. SIGKDD Exploration 5, 91–100 (2003)

    Article  Google Scholar 

  18. Xing, E.P., Jordan, M.I., Karp, R.M.: Feature selection for high-dimensional genomic microarray data. In: Proceedings of the Eighteenth International Conference in Machine Learning, ICML 2001 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hanczar, B. (2005). Combining Feature Selection and Feature Construction to Improve Concept Learning for High Dimensional Data. In: Zucker, JD., Saitta, L. (eds) Abstraction, Reformulation and Approximation. SARA 2005. Lecture Notes in Computer Science(), vol 3607. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11527862_19

Download citation

  • DOI: https://doi.org/10.1007/11527862_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-27872-6

  • Online ISBN: 978-3-540-31882-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics