Skip to main content

Simultaneous Relevant Feature Identification and Classification in High-Dimensional Spaces

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2452))

Abstract

Molecular profiling technologies monitor thousands of transcripts, proteins, metabolites or other species concurrently in biological samples of interest. Given two-class, high-dimensional profiling data, nominal Liknon [4] is a specific implementation of a methodology for performing simultaneous relevant feature identification and classification. It exploits the well-known property that minimizing an l 1 norm (via linear programming) yields a sparse hyperplane [15],[26],[2],[8],[17]. This work (i) examines computational, software and practical issues required to realize nominal Liknon, (ii) summarizes results from its application to five real world data sets, (iii) outlines heuristic solutions to problems posed by domain experts when interpreting the results and (iv) defines some future directions of the research.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S.V. Allander, N.N. Nupponen, M. Ringner, G. Hostetter, G.W. Maher, N. Goldberger, Y. Chen, Carpten J., A.G. Elkahloun, and P.S. Meltzer. Gastrointestinal Stromal Tumors with KIT mutations exhibit a remarkably homogeneous gene expression profile. Cancer Research, 61:8624–8628, 2001.

    Google Scholar 

  2. K. Bennett and A. Demiriz. Semi-supervised support vector machines. In Neural and Information Processing Systems, volume 11. MIT Press, Cambridge MA, 1999.

    Google Scholar 

  3. A. Bhattacharjee, W.G. Richards, J. Staunton, C. Li, S. Monti, P. Vasa, C. Ladd, J. Beheshti, R. Bueno, M. Gillette, M. Loda, G. Weber, E.J. Mark, E.S. Lander, W. Wong, B.E. Johnson, T.R. Golub, D.J. Sugarbaker, and M. Meyerson. Classification of human lung carcinomas by mrna expression profiling reveals distinct adenocarcinoma subclasses. Proc. Natl. Acad. Sci., 98:13790–13795, 2001.

    Google Scholar 

  4. C. Bhattacharyya, L.R. Grate, A. Rizki, D.C. Radisky, F.J. Molina, M.I. Jordan, M.J. Bissell, and I.S. Mian. Simultaneous relevant feature identification and classification in high-dimensional spaces: application to molecular profiling data. Submitted, Signal Processing, 2002.

    Google Scholar 

  5. M.P. Brown, W.N. Grundy, D. Lin, N. Cristianini, C.W. Sugnet, T.S. Furey, M. Ares, Jr, and D. Haussler. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci., 97:262–267, 2000.

    Google Scholar 

  6. P. Cheeseman and J. Stutz. Bayesian Classification (AutoClass): Theory and Results. In U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 153–180. AAAI Press/MIT Press, 1995. The software is available at the URL http://www.gnu.org/directory/autoclass.html.

  7. M.L. Chow, E.J. Moler, and I.S. Mian. Identifying marker genes in transcription profile data using a mixture of feature relevance experts. Physiological Genomics, 5:99–111, 2001.

    Google Scholar 

  8. N. Cristianini and J. Shawe-Taylor. Support Vector Machines and other kernel-based learning methods. Cambridge University Press, Cambridge, England, 2000.

    Google Scholar 

  9. S.M. Dhanasekaran, T.R. Barrette, R. Ghosh, D. Shah, S. Varambally, K. Kurachi, K.J. Pienta, M.J. Rubin, and A.M. Chinnaiyan. Delineation of prognostic biomarkers in prostate cancer. Nature, 432, 2001.

    Google Scholar 

  10. D.L. Donoho and X. Huo. Uncertainty principles and idea atomic decomposition. Technical Report, Statistics Department, Stanford University, 1999.

    Google Scholar 

  11. R. Fletcher. Practical Methods in Optimization. John Wiley & Sons, New York, 2000.

    Google Scholar 

  12. T. Furey, N. Cristianini, N. Duffy, D. Bednarski, M. Schummer, and D. Haussler. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 16:906–914, 2000.

    Article  Google Scholar 

  13. M.E. Garber, O.G. Troyanskaya, K. Schluens, S. Petersen, Z. Thaesler, M. Pacyana-Gengelbach, M. van de Rijn, G.D. Rosen, C.M. Perou, R.I. Whyte, R.B. Altman, P.O. Brown, D. Botstein, and I. Petersen. Diversity of gene expression in adenocarcinoma of the lung. Proc. Natl. Acad. Sci., 98:13784–13789, 2001.

    Google Scholar 

  14. T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfeld, and E.S. Lander. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286:531–537, 1999. The data are available at the URL http://waldo.wi.mit.edu/MPR/data_sets.html.

    Article  Google Scholar 

  15. T. Graepel, B. Herbrich, R. Schölkopf, A.J. Smola, P. Bartlett, K. Müller, K. Obermayer, and R.C. Williamson. Classification on proximity data with lp-machines. In Ninth International Conference on Artificial Neural Networks, volume 470, pages 304–309. IEE, London, 1999.

    Google Scholar 

  16. L.R. Grate, C. Bhattacharyya, M.I. Jordan, and I.S. Mian. Integrated analysis of transcript profiling and protein sequence data. In press, Mechanisms of Ageing and Development, 2002.

    Google Scholar 

  17. T. Hastie, R. Tibshirani, and Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, New York, 2000.

    Google Scholar 

  18. I. Hedenfalk, D. Duggan, Y. Chen, M. Radmacher, M. Bittner, R. Simon, P. Meltzer, B. Gusterson, M. Esteller, M. Raffeld, Z. Yakhini, A. Ben-Dor, E. Dougherty, J. Kononen, L. Bubendorf, W. Fehrle, S. Pittaluga, S. Gruvberger, N. Loman, O. Johannsson, H. Olsson, B. Wilfond, G. Sauter, O.-P. Kallioniemi, A. Borg, and J. Trent. Gene-expression profiles in hereditary breast cancer. New England Journal of Medicine, 344:539–548, 2001.

    Article  Google Scholar 

  19. J. Khan, J.S. Wei, M. Ringner, L.H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, Antonescu C.R., Peterson C., and P.S. Meltzer. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 7:673–679, 2001.

    Article  Google Scholar 

  20. G. Lanckerit, L. El Ghaoui, C. Bhattacharyya, and M.I. Jordan. Minimax probability machine. Advances in Neural Processing systems, 14, 2001.

    Google Scholar 

  21. L.A. Liotta, E.C. Kohn, and E.F. Perticoin. Clinical proteomics. personalized molecular medicine. JAMA, 14:2211–2214, 2001.

    Article  Google Scholar 

  22. E.J. Moler, M.L. Chow, and I.S. Mian. Analysis of molecular profile data using generative and discriminative methods. Physiological Genomics, 4:109–126, 2000.

    Google Scholar 

  23. D.A. Notterman, U. Alon, A.J. Sierk, and A.J. Levine. Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays. Cancer Research, 61:3124–3130, 2001.

    Google Scholar 

  24. E.F. Petricoin III, A.M. Ardekani, B.A. Hitt, P.J. Levine, V.A. Fusaro, S.M. Steinberg, G.B Mills, C. Simone, D.A. Fishman, E.C. Kohn, and L.A. Liotta. Use of proteomic patterns in serum to identify ovarian cancer. The Lancet, 359:572–577, 2002.

    Article  Google Scholar 

  25. S. Ramaswamy, P. Tamayo, R. Rifkin, S. Mukherjee, C.-H. Yeang, M. Angelo, C. Ladd, M. Reich, E. Latulippe, J.P. Mesirov, T. Poggio, W. Gerald, M. Loda, E.S. Lander, and T.R. Golub. Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci., 98:15149–15154, 2001. The data are available from http://www-genome.wi.mit.edu/mpr/GCM.html.

  26. A. Smola, T.T. Friess, and B. Schölkopf. Semiparametric support vector and linear programming machines. In Neural and Information Processing Systems, volume 11. MIT Press, Cambridge MA, 1999.

    Google Scholar 

  27. T. Sorlie, C.M. Perou, R. Tibshirani, T. Aas, S. Geisler, H. Johnsen, T. Hastie, M.B. Eisen, M. van de Rijn, S.S. Jeffrey, T. Thorsen, H. Quist, J.C. Matese, P.O. Brown, D. Botstein, P.E. Lonning, and A.-L. Borresen-Dale. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl. Acad. Sci., 98:10869–10874, 2001.

    Google Scholar 

  28. A.I. Su, J.B. Welsh, L.M. Sapinoso, S.G. Kern, P. Dimitrov, H. Lapp, P.G. Schultz, S.M. Powell, C.A. Moskaluk, H.F. Frierson Jr, and G.M. Hampton. Molecular classification of human carcinomas by use of gene expression signatures. Cancer Research, 61:7388–7393, 2001.

    Google Scholar 

  29. L.J. van’t Veer, H. Dai, M.J. van de Vijver, Y.D. He, A.A. Hart, M. Mao, H.L. Peterse, van der Kooy K., M.J. Marton, A.T. Witteveen, G.J. Schreiber, R.M. Kerkhoven, C. Roberts, P.S. Linsley, R. Bernards, and S.H. Friend. Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415:530–536, 2002.

    Article  Google Scholar 

  30. V. Vapnik. Statistical Learning Theory. Wiley, New York, 1998.

    MATH  Google Scholar 

  31. J.B. Welsh, L.M. Sapinoso, A.I. Su, S.G. Kern, J. Wang-Rodriguez, C.A. Moskaluk, J.F. Frierson Jr, and G.M. Hampton. Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer. Cancer Research, 61:5974–5978, 2001.

    Google Scholar 

  32. J. Weston, Mukherjee S., O. Chapelle, M. Pontil, T. Poggio, and V. Vapnik. Feature Selection for SVMs. In Advances in Neural Information Processing Systems, volume 13, 2000.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Grate, L.R., Bhattacharyya, C., Jordan, M.I., Mian, I.S. (2002). Simultaneous Relevant Feature Identification and Classification in High-Dimensional Spaces. In: Guigó, R., Gusfield, D. (eds) Algorithms in Bioinformatics. WABI 2002. Lecture Notes in Computer Science, vol 2452. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45784-4_1

Download citation

  • DOI: https://doi.org/10.1007/3-540-45784-4_1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44211-0

  • Online ISBN: 978-3-540-45784-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics