Abstract
We recently developed a multivariate method that selects a subset of discriminative genes for sample classification based on gene expression data. The method combines a search tool, a genetic algorithm (GA), and a non-parametric pattern recognition method, based on the k-nearest nearest neighbors (KNN) approach. We begin by selecting many subsets of genes that can discriminate among classes of samples using a training set. Subsequently, the genes are ranked according to the frequency of gene selection. The top-ranked genes (e.g. 50) are then used to classify test set samples. For a widely-available set of leukemia data, the top 50 genes identified by the GA/KNN method not only correctly classified 33 of the 34 test set samples, but also discovered the two distinct clinical subtypes within ALL without applying prior knowledge. The method has been successfully applied to several expression data sets. It may be used to identify a subset of informative genes (biomarkers) for sample classification for a variety of profiling studies including tumors.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D. and Levine, A.J. Proc. Natl. Acad. Sci. USA, 1999, 96, 6745.
Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., Lossos, I.S., Rosenwald, A., Boldrick, J.C., Sabet, H., Tran, T., Yu, X., Powell, J.I., Yang, L., Marti, G.E., Moore, T., Hudson, J. Jr, Lu, L., Lewis, D.B., Tibshirani, R., Sherlock, G., Chan, W.C., Greiner, T.C., Weisenburger, D.D., Armitage, J.O., Warnke, R., Levy, R., Wilson, E., Grever, M.R., Byrd, J.C., Botstein, D., Brown, P.O. and Staudt, L.M. Nature, 2000, 403, 503.
Austin, G.E., Alvarado, C.S., Austin, E.D., Hakami, N., Zhao, W.G., Chauvenet, A., Borowitz, M.J., and Carroll, A.J. Am. J. Clin. Pathol, 1998, 110, 575.
Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., Yakhini, Z. and Ben-Dor, A. In Proceedings of the Fourth International Conference on Computational Molecular Biology (RECOMB2000), ACM press, New York, 2000.
Bittner, M., Meitzer, P., Chen, Y., Jiang, Y., Seftor, E., Hendrix, M., Radmacher, M., Simon, R., Yakhini, Z., Ben-Dor, A., Sampas, N., Dougherty, E., Wang, E., Marincola, F., Gooden, C., Lueders, J., Glatfelter, A., Pollock, P., Carpten, J., Gillanders, E., Leja, D., Dietrich, K., Beaudry, C., Berens, M., Alberts, D., Sondak, V., Hayward, N. and Trent, J. Nature, 2000, 406, 536.
Brown, M.H., Cantrell, D.A., Brattsand, G., Crumpton, M.J., and Gullberg, M. Nature, 1989, 339, 551.
Bull, J.H., Ellison, G., Patel, A., Muir, G., Walker, M., Underwood, M., Khan, F., and Paskins, L. Br. J. Cancer, 2001, 84, 1512.
Cortes, C. and Vapnik, V. Machine Learning, 1995, 20, 273.
Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. Proc. Natl. Acad. Sci. USA, 1998, 95, 14863.
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H, Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., and Lander, E.S. Science, 1999, 286, 531.
Gordon, M.S., Kato, R.M., Lansigan, F., Thompson, A.A., Wall, R., and Rawlings, D.J. Proc. Natl. Acad. Sci. USA, 2000, 97, 5504.
Greiner, A., Muller, K.B., Hess, J., Pfeffer, K., Muller-Hermelink, H.K., and Wirth, T. Am. J. Pathol., 2000, 56, 501.
Hombach, J., Lottspeich, F., and Reth, M. Eur. J. Immunol., 1990, 20, 2795.
Hombach, J., Tsubata, T., Leclercq, L., Stappert, H. and Reth, M. Nature, 1990, 343, 760.
Judson, R. Genetic algorthms and their use in chemistry. In Lipkowitz, K.B. and Boyd, D.B. (eds), Reviews in Computational Chemistry, VCH publishers, New York, 1997, vol 10, pp 1–66,
Li, L., Darden, T.A., Weinberg, C.R. and Pedersen, L.G. Combinatorial Chemistry & High Throughput Screening., in press.
Li, L., Pedersen, L.G., Darden, T.A. and Weinberg, C.R. Bioinformatics, in press.
Malek, S.N., Dordai, D.I., Reim, J., Dintzis, H., and Desiderio, S. Proc. Natl Sci. Acad. USA, 1998, 95, 7351.
Manabe, A., Mori, T., Ebihara, Y., Koyama, T., Okuyama, I., Hosoya, R., Kaneko, M., Ishimoto, K., Nakahata, T., and Nakazawa, S. Inter. J. Hematol., 1998, 67, 45.
Massart, D.L., Vandeginste, B.G.M., Deming, S.N., Michotte, Y., and Kaufman, L. In Chemometrics: a textbook (Data Handling in Science and Technology, vol 2); Elsevier Science B. V: New York, 1988, pp. 339–368.
Matthias P. Semin. Immunol., 1998, 10, 155.
Pekarsky, Y., Koval, A., Hallas, C., Bichi, R., Tresini, M, Malstrom, S., Russo, G., Tsichlis, P., and Croce, C.M. Proc. Natl. Acad. Sci. USA, 2000, 97, 3028.
Perou, C.M., Sørlie, T., Eisen, M.B., van de Rijn, M., Jeffrey, S.S., Rees, C.A., Pollack, J.R., Ross, D.T., Johnsen, H., Aksien, L.A., Fluge, O., Pergamenschikov, A., Williams, C., Zhu, S.X., Lonning, P.E., Børresen-Dale, A.L., Brown, P.O. and Botstein, D. Nature, 2000, 406, 747.
Ross, D.T., Scherf, U., Eisen, M.B., Perou, C.M., Rees, C., Spellman, P., Iyer, V., Jeffrey, S.S., Van de Rijn, M., Waltham, M., Pergamenschikov, A., Lee, J.C., Lashkari, D., Shalon, D., Myers, T.G., Weinstein, J.N., Botstein, D., and Brown, P.O. Nat. Genet., 2000, 24, 227.
Schubart, D.B., Rolink, A., Kosco-Vilbois, M.H., Botteri, F., and Matthias, P. Nature, 1996, 383, 538.
Scheuermann, R.H. and Racila, E. Leuk. Lymphoma, 1995, 18, 385.
Sterneck, E., Paylor, R., Jackson-Lewis, V., Libbey, M., Przedborski, S., Tessarollo, L., Crawley, J.N., and Johnson, P.F. Proc. Natl Sci. Acad. USA, 1998, 95, 10908.
Toronen. P, Kolehmainen. M, Wong. C, and Castren. E. FEBS lett., 1999, 451, 142.
Winterbourn, C.C, Vissers, M.C.M, and Kettle, A.J. Curr. Opin. Hematol., 2000, 7, 53.
Vandeginste, B.G.M., Massart, D.L., Buydens, L.M.C., De Jong, S., Lewi, P.J. and Smeyers-Verbeke, J. In Handbook of Chemometrics and Qualimetrics, Part B, Elsevier Science, The Netherlands, 1998.
Virgilio, L., Isobe, M., Narducci, M.G., Carotenuto, P., Camerini, B., Kurosawa, N., Abbas-ar-Rushdi, Croce, C.M., and Russo, G. Proc. Natl. Acad. Sci. USA, 1993, 90, 9275.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer Science+Business Media New York
About this chapter
Cite this chapter
Li, L., Pedersen, L.G., Darden, T.A., Weinberg, C.R. (2002). Computational Analysis of Leukemia Microarray Expression Data Using the GA/KNN Method. In: Lin, S.M., Johnson, K.F. (eds) Methods of Microarray Data Analysis. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0873-1_7
Download citation
DOI: https://doi.org/10.1007/978-1-4615-0873-1_7
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-5281-5
Online ISBN: 978-1-4615-0873-1
eBook Packages: Springer Book Archive