Design Issues and Comparison of Methods for Microarray-Based Classification

Dougherty, Edward R.; Attoor, Sanju N.

doi:10.1007/0-306-47825-0_7

Edward R. Dougherty² &
Sanju N. Attoor²

385 Accesses

Conclusion

Except in situations where the amount of data is large in comparison to the number of variables, classifier design and error estimation involve subtle issues. This is especially so in applications such as cancer classification where there is no prior knowledge concerning the vector-label distributions involved. It is clearlyprudent to try to achieve classification using small numbers of genes and rules of low complexity (low VC dimension), and to use cross-validationwhen it is not possible to obtain large independent samples for testing. Even when one uses a cross-validation method such as leave-one-out estimation, one is still confronted by the high variance of the estimator. In many applications, large samples are impossible owing to either cost or availability. Therefore, it is unlikely that a statistical approach alone will provide satisfactory results. Rather, one can use the results of classification analysis to discover gene sets that potentially provide good discrimination, and then focus attention on these. In the same vein, one can utilize the common engineering approach of integrating data with human knowledge to arrive at satisfactory systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M. and Yakhini, Z. (2000) Tissue classification with gene expression profiles. Computational Biology, 7, 559–583.
CAS Google Scholar
Bishop, C. M., (1995) Neural Networks for Pattern Recognition, Oxford University Press, Oxford.
Google Scholar
Bittner, M., Meltzer, P., Khan, J., Chen, Y., Jiang, Y., Seftor, E., Hendrix, M., Radmacher, M., Simon, R., Yakhini, Z., Ben-Dor, A., Dougherty, E., Wang, E., Marincola, F., Gooden, C., Lueders, J., Glatfelter, A., Pollock, P., Gillanders, E., Leja, A., Dietrich, K., Beaudry, C., Berrens, M., Alberts, D., Sondak, V., Hayward, N., and Trent, J. (2000) Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature, 406, 536–540.
Article PubMed CAS Google Scholar
Brown, M. P. S., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C. W., Furey, T. S., Ares, Jr., M., and D. Haussler. (2000) Knowledge-Based Analysis of Microarray Gene Expression Data by Using Support Vector Machines. Proc. National Academy Science, 97(1), 262–267.
CAS Google Scholar
Cybenko, G. (1989) Approximation by Superposition of Sigmoidal Functions. Mathematics Control, Signals, Systems, 2, 303–314.
Google Scholar
Devroye, L., Gyorfi, L., and G. Lugosi. (1996) A Probabilistic Theory of Pattern Recognition. Springer-Verlag, New York.
Google Scholar
Devroye, L., and Kryzak, A. (1989) An Equivalence Theorem for L₁ Convergence of the Kernel Regression Estimate, Statistical Planning and Inference, 23, 71–82.
Google Scholar
Dougherty, E. R. (2001) Small Sample Issues for Microarray-Based Classification. Comparative and Functional Genomics, 2, 28–34.
Article CAS Google Scholar
Farago, A., and Lugosi, G. (1993) Strong Universal Consistency of Neural Network Classifiers. IEEE Trans. on Information Theory, 39, 1146–1151.
Article Google Scholar
Funahashi, K. (1989) On the Approximate Realization of Continuous Mappings by Neural Networks. Neural Networks, 2, 183–192.
Article Google Scholar
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D. and Lander, E. S. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531–537.
Article PubMed CAS Google Scholar
Gordon, L. and R. Olshen (1978) Asymptotically Efficient Solutions to the Classification Problem, Annals of Statistics, 6, 525–533.
Google Scholar
Hedenfalk, I., Duggan, D., Chen, Y., Radmacher, M., Bittner, M., Simon. R., Meltzer, P., Gusterson, B., Esteller, M., Raffeld, Yakhini, Z., Ben-Dor, A., Dougherty, E., Kononen, J., Bubendorf, L., Fehrle, W., Pittaluga, S., Gruvverger, S., Loman, N., Johannsson, O., Olsson, H., Wifond, B., Sauter, G., Kallioniemi, O. P., Borg, A., and Trent, J. (2001) Gene expression profiles distinguish hereditary breast cancers. New England J. Medicine, 34, 539–548.
Google Scholar
Khan, J., Wei, J. S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C. R., Peterson, C. and Meltzer, P. S. (2002) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 7, 673–679.
Google Scholar
Kim, S., Dougherty, E. R., Barrera, J., Chen, Y., Bittner, M., and J. M. Trent (2002) Strong Feature Sets From Small Samples. Journal of Computational Biology, 9(1).
Google Scholar
Rosenblatt, F. (1962) Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms, Spartan, Washington DC.
Google Scholar
Stone, C. (1977) Consistent Nonparametric Regression. Annals of Statistics, 5, 595–645.
Google Scholar
Vapnik, V. N., Golowich, S. E., and A. Smola (1997) Support Vector Method for Function Approximation, Regression, and Signal Processing. in Advances in Neural Information Processing Systems, 9, MIT Press, Cambridge.
Google Scholar
Vapnik, V. N. (1998) Statistical Learning Theory, John Wiley, New York.
Google Scholar
Vapnik, V., and A. Chervonenkis (1974) Theory of Pattern Recognition, Nauka, Moscow.
Google Scholar
Vapnik, V., and A. Chervonenkis (1971) On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities Theory of Probability and its Applications, 16, 264–280.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Texas A&M University, College Station, TX, USA
Edward R. Dougherty & Sanju N. Attoor

Authors

Edward R. Dougherty
View author publications
You can also search for this author in PubMed Google Scholar
Sanju N. Attoor
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Texas M. D. Anderson Cancer Center, Texas, USA
Wei Zhang & Ilya Shmulevich &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dougherty, E.R., Attoor, S.N. (2003). Design Issues and Comparison of Methods for Microarray-Based Classification. In: Zhang, W., Shmulevich, I. (eds) Computational and Statistical Approaches to Genomics. Springer, Boston, MA. https://doi.org/10.1007/0-306-47825-0_7

Download citation

DOI: https://doi.org/10.1007/0-306-47825-0_7
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4020-7023-5
Online ISBN: 978-0-306-47825-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics