Conclusion
Except in situations where the amount of data is large in comparison to the number of variables, classifier design and error estimation involve subtle issues. This is especially so in applications such as cancer classification where there is no prior knowledge concerning the vector-label distributions involved. It is clearlyprudent to try to achieve classification using small numbers of genes and rules of low complexity (low VC dimension), and to use cross-validationwhen it is not possible to obtain large independent samples for testing. Even when one uses a cross-validation method such as leave-one-out estimation, one is still confronted by the high variance of the estimator. In many applications, large samples are impossible owing to either cost or availability. Therefore, it is unlikely that a statistical approach alone will provide satisfactory results. Rather, one can use the results of classification analysis to discover gene sets that potentially provide good discrimination, and then focus attention on these. In the same vein, one can utilize the common engineering approach of integrating data with human knowledge to arrive at satisfactory systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M. and Yakhini, Z. (2000) Tissue classification with gene expression profiles. Computational Biology, 7, 559–583.
Bishop, C. M., (1995) Neural Networks for Pattern Recognition, Oxford University Press, Oxford.
Bittner, M., Meltzer, P., Khan, J., Chen, Y., Jiang, Y., Seftor, E., Hendrix, M., Radmacher, M., Simon, R., Yakhini, Z., Ben-Dor, A., Dougherty, E., Wang, E., Marincola, F., Gooden, C., Lueders, J., Glatfelter, A., Pollock, P., Gillanders, E., Leja, A., Dietrich, K., Beaudry, C., Berrens, M., Alberts, D., Sondak, V., Hayward, N., and Trent, J. (2000) Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature, 406, 536–540.
Brown, M. P. S., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C. W., Furey, T. S., Ares, Jr., M., and D. Haussler. (2000) Knowledge-Based Analysis of Microarray Gene Expression Data by Using Support Vector Machines. Proc. National Academy Science, 97(1), 262–267.
Cybenko, G. (1989) Approximation by Superposition of Sigmoidal Functions. Mathematics Control, Signals, Systems, 2, 303–314.
Devroye, L., Gyorfi, L., and G. Lugosi. (1996) A Probabilistic Theory of Pattern Recognition. Springer-Verlag, New York.
Devroye, L., and Kryzak, A. (1989) An Equivalence Theorem for L1 Convergence of the Kernel Regression Estimate, Statistical Planning and Inference, 23, 71–82.
Dougherty, E. R. (2001) Small Sample Issues for Microarray-Based Classification. Comparative and Functional Genomics, 2, 28–34.
Farago, A., and Lugosi, G. (1993) Strong Universal Consistency of Neural Network Classifiers. IEEE Trans. on Information Theory, 39, 1146–1151.
Funahashi, K. (1989) On the Approximate Realization of Continuous Mappings by Neural Networks. Neural Networks, 2, 183–192.
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D. and Lander, E. S. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531–537.
Gordon, L. and R. Olshen (1978) Asymptotically Efficient Solutions to the Classification Problem, Annals of Statistics, 6, 525–533.
Hedenfalk, I., Duggan, D., Chen, Y., Radmacher, M., Bittner, M., Simon. R., Meltzer, P., Gusterson, B., Esteller, M., Raffeld, Yakhini, Z., Ben-Dor, A., Dougherty, E., Kononen, J., Bubendorf, L., Fehrle, W., Pittaluga, S., Gruvverger, S., Loman, N., Johannsson, O., Olsson, H., Wifond, B., Sauter, G., Kallioniemi, O. P., Borg, A., and Trent, J. (2001) Gene expression profiles distinguish hereditary breast cancers. New England J. Medicine, 34, 539–548.
Khan, J., Wei, J. S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C. R., Peterson, C. and Meltzer, P. S. (2002) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 7, 673–679.
Kim, S., Dougherty, E. R., Barrera, J., Chen, Y., Bittner, M., and J. M. Trent (2002) Strong Feature Sets From Small Samples. Journal of Computational Biology, 9(1).
Rosenblatt, F. (1962) Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms, Spartan, Washington DC.
Stone, C. (1977) Consistent Nonparametric Regression. Annals of Statistics, 5, 595–645.
Vapnik, V. N., Golowich, S. E., and A. Smola (1997) Support Vector Method for Function Approximation, Regression, and Signal Processing. in Advances in Neural Information Processing Systems, 9, MIT Press, Cambridge.
Vapnik, V. N. (1998) Statistical Learning Theory, John Wiley, New York.
Vapnik, V., and A. Chervonenkis (1974) Theory of Pattern Recognition, Nauka, Moscow.
Vapnik, V., and A. Chervonenkis (1971) On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities Theory of Probability and its Applications, 16, 264–280.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Kluwer Academic Publishers
About this chapter
Cite this chapter
Dougherty, E.R., Attoor, S.N. (2003). Design Issues and Comparison of Methods for Microarray-Based Classification. In: Zhang, W., Shmulevich, I. (eds) Computational and Statistical Approaches to Genomics. Springer, Boston, MA. https://doi.org/10.1007/0-306-47825-0_7
Download citation
DOI: https://doi.org/10.1007/0-306-47825-0_7
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4020-7023-5
Online ISBN: 978-0-306-47825-3
eBook Packages: Springer Book Archive