Skip to main content

Classification of Acute Leukemia Based on DNA Microarray Gene Expressions Using Partial Least Squares

  • Chapter
Methods of Microarray Data Analysis

Abstract

Analysis of microarray data, when presented with raw gene expression intensity data, often take two main steps when analyzing the data. First pre-process the data by rescaling and standardizing so that overall intensities for each array are equivalent. Second, apply statistical methodologies to answer scientific questions of interest. In this paper, for the data pre-processing step, we introduce a thresholding algorithm for rescaling each array. Step 2 involves statistical classification and dimension reduction methodologies. For this we introduce the method of partial least squares (PLS) and apply it to the leukemia microarray data set of Golub et al. (1999). We also discuss the use of principal components analysis (PCA), quadratic discriminant analysis (QDA) and logistic discrimination (LD). Finally, we discuss other potential applications of PLS in analyzing gene expression data that address prediction of a target gene, prediction of the reaction in cell lines, assessment of patient survival, and generalisations in predicting multiple classes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Alon et al. (1999), “Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays,” Proceedings of the National Academy of Sciences, 96, 6745–6750.

    Article  CAS  Google Scholar 

  • Alizadeh et al. (2000), “Distinct Types of Diffuse Large B—Cell Lymphoma Identified by Gene Expression Profiling,” Nature, 403, 503–511.

    Article  PubMed  CAS  Google Scholar 

  • Bittner  et al. (2000), “Molecular Classification of Cutaneous Malignant Melanoma by Gene Expression Profiling,” Nature, 406, 536–540.

    Article  PubMed  CAS  Google Scholar 

  • de Jong, S. (1993), “SIMPLS: An Alternative Approach to Partial Least Squares Regression,” Chemometrics and Intelligent Laboratory Systems, 18, 251–263.

    Article  Google Scholar 

  • Dudoit, S., Fridlyand, J., Speed, T.P. (2000), “Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data,” Technical Report #576, Department of Statistics, U. C. Berkeley.

    Google Scholar 

  • Flury, B. (1997), A First Course in Multivariate Analysis. Springer-Verlag, New York.

    Google Scholar 

  • Frank, I.E., and Friedman, J.H. (1993), “A Statistical View of Some Chemometric Regression Tools” (with discussion), Technometrics, 35, 109–148.

    Article  Google Scholar 

  • Garthwaite, P.H. (1994), “An Interpretation of Partial Least Squares,” Journal of the American Statistical Association, 89, 122–127.

    Article  Google Scholar 

  • Geladi, P., and Kowalski, B.R. (1986), “Partial Least Squares Regression: Tutorial,” Analytica Chimica Acta, 185, 1–17.

    Article  CAS  Google Scholar 

  • Golub  et al. (1999), “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” Science, 286, 531–537.

    Article  PubMed  CAS  Google Scholar 

  • Hand, J.D. (1981), Discrimination and Classification. John Wiley Sons, Chichester, England.

    Google Scholar 

  • Hand, J.D. (1997), Construction and Assessment of Classification Rules. John Wiley Sons, Chichester, England.

    Google Scholar 

  • Helland, I.S. (1988), “On the Structure of Partial Least Squares,” Communications in Statistics-Simulation and Computation, 17, 581–607.

    Article  Google Scholar 

  • Helland, S., and Almoy, T. (1994), “Comparison of Prediction Methods When Only a Few Components are Relevant,” Journal of the American Statistical Association, 89, 583–591.

    Article  Google Scholar 

  • Hoskuldsson, A. (1988), “PLS Regression Methods,” Journal of Chemometrics, 2, 211–228.

    Article  Google Scholar 

  • Johnson, R.A. and Wichern, D.W. (1992), Applied Multivariate Analysis. Prentice-Hall, New Jersey, 4th edition.

    Google Scholar 

  • Jolliffe, I.T. (1986), Principal Component Analysis. Springer-Verlag, New York.

    Google Scholar 

  • Lorber, A., Wangen, L.E., and Kowalski, B.R. (1997), “A Theoretical Foundation for the PLS Algorithm,” Journal of Chemometrics, 1, 19–31.

    Article  Google Scholar 

  • Mardia, K.V., Kent, J.T., and Bibby, J.M. (1979), Multivariate Analysis. Academic Press, London.

    Google Scholar 

  • Martens, H. and Naes, T. (1989), Multivariate Calibration, John Wiley Sons, New York.

    Google Scholar 

  • Massey, W.F. (1965), “Principal Components Regression in Exploratory Statistical Research,” Journal of the American Statistical Association, 60, 234–246.

    Article  Google Scholar 

  • Nguyen, D.V. and Rocke, D.M. (2000), “Classification in High Dimension with Application to DNA Microarray Data,” manuscript.

    Google Scholar 

  • Nguyen, D.V. and Rocke, D.M. (2001), “Tumor Classification by Partial Least Squares Using Microarray Gene Expression Data,” to appear in Bioinformatics.

    Google Scholar 

  • Nguyen, D.V. and Rocke, D.M. (2001b), “Partial Least Squares Proportional Hazard Regression for Application to DNA Microarray Data,” manuscript.

    Google Scholar 

  • Nguyen, D.V. and Rocke, D.M. (2001c), “Multi-Class Cancer Classification Via Partial Least Squares Using Gene Expression Profiles,” manuscript.

    Google Scholar 

  • Perou N et al. (2000), “Molecular Portrait of Human Breast Tumors,” Nature, 406, 747–752.

    Article  PubMed  CAS  Google Scholar 

  • Perou N et al. (1999), “Distinctive Gene Expression Patterns in Human Mammary Epithelial Cells and Breast Cancer,” Proceedings of the National Academiy of Sciences, USA, 96, 9112–9217.

    Google Scholar 

  • Phatak, A., and Reilly, P.M., and Penlidis, A. (1992), “The Geometry of 2-Block Partial Least Squares,” Communications in Statistics-Theory and Methods, 21, 1517–1553.

    Article  Google Scholar 

  • Press, S.J. (1982), Applied Multivariate Analysis: Using Bayesian and Frequentist Methods of Inference. Robert E. Krieger Publishing Company Inc., Malabar, Florida, 2nd edition.

    Google Scholar 

  • Rocke, D.M. and Durbin, B. (2000), “A Model for Measurement Error for Gene Expression Arrays,” to appear in Journal of Computational Biology.

    Google Scholar 

  • Ross  et al. (2000), “Systematic Variation in Gene Expression Patterns in Human Cancer Cell Lines,” Nature Genetics, 24, 227–235.

    Article  PubMed  CAS  Google Scholar 

  • Scherf  et al. (2000), “A Gene Expression Database for the Molecular Pharmacology of Cancer,” Nature Genetics, 24, 236–244.

    Article  PubMed  CAS  Google Scholar 

  • Stone, M., and Brooks, R. J. (1990), “Continuum Regression: Cross-validated Sequentially Constructed Prediction Embracing Ordinary Least Squares, Partial Least Squares, and Principal Components Regression” (with discussion), Journal of the Royal Statistical Society, Series B, 52, 237–269.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer Science+Business Media New York

About this chapter

Cite this chapter

Nguyen, D.V., Rocke, D.M. (2002). Classification of Acute Leukemia Based on DNA Microarray Gene Expressions Using Partial Least Squares. In: Lin, S.M., Johnson, K.F. (eds) Methods of Microarray Data Analysis. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0873-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-0873-1_9

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-5281-5

  • Online ISBN: 978-1-4615-0873-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics