Skip to main content

Identifying Important Explanatory Variables for Time-Varying Outcomes

  • Chapter
Fundamentals of Data Mining in Genomics and Proteomics

Abstract

Many applications in modern biology measure a large number of genomic or proteomic covariates and are interested in assessing the impact of each of these covariates on a particular outcome of interest. In a study which follows a cohort of HIV-positive patients over time, for example, a researcher may genotype the virus infecting each patient to ascertain the presence or absence of a large number of mutations, in the hope of identifying mutations that affect how a patient’s plasma HIV RNA level (viral load) responds to a new drug regimen. Along with an estimate of the impact of each mutation on the time course of viral load, the researcher would generally like to have a measure of the statistical significance of these estimates in order to identify those mutations that are most likely to be genuinely related to the outcome. Such information could then be used to inform the decision of which drugs should be included in the regimen of a patient with a particular pattern of mutations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Benjamini, Y. and Hochberg, T. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Stat. Soc., Series B, 85:289–300.

    Google Scholar 

  • Bland, J.M. and Altman, D.G. (1995). Multiple significance tests: the bonferroni method. Brit. Med. J., 310:170.

    PubMed  CAS  Google Scholar 

  • Boucher, C.A.B., Cammack, P., Schipper, R., Rouse, P.L., and Cameron, J.M. (1993). High-level resistance to (−) enantiomeric 2′deoxy-3′thiacytidine (3tc) in vitro due to one amino acid substitution in the catalytic site of human immunodeficiency virus type 1 reverse transcriptase. Antimicrobial Agents and Chemotherapy, 37:2231–2234.

    PubMed  CAS  Google Scholar 

  • Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2):123–140.

    Google Scholar 

  • Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. (1984). Classification and regression trees. The Wadsworth Statistics/Probability series. Wadsworth International Group.

    Google Scholar 

  • Cleveland, W.S. (1979). Robust locally-weighted regression and smoothing scatterplots. J. Am. Stat. Assoc., 74:829–836.

    Article  Google Scholar 

  • Dudoit, S. and van der Laan, M. J. (2006). Multiple Testing Procedures and Applications to Genomics. Springer. (In preparation).

    Google Scholar 

  • Kooperberg, C., Bose, S., and Stone, C.J. (1997). Polychotomous regression. J. Am. Stat. Assoc., 92:117–127.

    Article  Google Scholar 

  • Lacey, S.F. and Larder, B.A. (1994). Novel mutation (v75t) in human immunodeficiency virus type 1 reverse transcriptase confers resistance to 2′-3′didehydro-2′,3′-dideoxythymidine in cell culture. Antimicrobial Agents and Chemotherapy, 38(6): 1428–1432.

    PubMed  CAS  Google Scholar 

  • Lehmann, E.L. and Romano, J. (2005). Testing Statistical Hypotheses. Springer, New York, 3rd edition.

    Google Scholar 

  • Liang, K. and Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73(1):13–22.

    Article  Google Scholar 

  • McCullagh, P. and Nelder, J. A. (1989). Generalized linear models (2nd edition). London: Chapman & Hall.

    Google Scholar 

  • Neugebauer, R. and van der Laan, M.J. (2005). Why prefer double robust estimates in causal inference? J. Stat. Planning and Inference, 129(1–2):405–426.

    Article  Google Scholar 

  • R Development Core Team (2005). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.

    Google Scholar 

  • Rhee, S., Taylor, J., Wadhera, G., Ravela, J., Ben-Hur, A., Brutlag, D., and Shafer, R.W. (2006). Genotypic predictors of human immunodeficiency virus type 1 drug resistance. (Submitted).

    Google Scholar 

  • Schurman, R., Nijhuis, M., van Leeuwen, R., Schipper, P., de Jong, D., Collis, P., Danner, S.A., Mulder, J., Loveday, C., and Christopherson, C. (1995). Rapid changes in human immunodeficiency virus type 1 rna load and appearance of drug-resistant virus populations in persons treated with lamivudine (3tc). J. Infect Dis., 171:1411–1419.

    Google Scholar 

  • Shafer, R.W. (2002). Genotypic testing for human immunodeficiency virus type 1 drug restistance. Clin. Microbiol. Rev., 15(2):247–277.

    Article  PubMed  CAS  Google Scholar 

  • Sinisi, S.E. and van der Laan, M.J. (2004). Deletion/substitution/addition algorithm in learning with applications in genomics. Stat. Appl. Gen. Mol. Biol., 3(1).

    Google Scholar 

  • Tisdale, M., Kemp, S.D., Parry, N.R., and Larder, B.A. (1993). Rapid in vitro selection of human immunodeficiency virus 1 type 1 resistant to 3′-thyiacytidine inhibitors due to a mutation in the ymdd region of reverse transcriptase. Proc. Natl. Acad. Sc. USA, 90:5653–5656.

    Article  CAS  Google Scholar 

  • van der Laan, M.J. (2006a). Causal effects for intention to treat and realistic individualized treatment rules. Technical Report 203, Division of Biostatistics, University of California, Berkeley.

    Google Scholar 

  • van der Laan, M.J. (2006b). Statistical inference for variable importance. Intl. J. Biostat., 2(1).

    Google Scholar 

  • Westfall, P.H. and Young, S.S. (1993). Resampling-based multiple testing: Examples and methods for p-value adjustment. Wiley, New York.

    Google Scholar 

  • Zeger, S.L. and Liang, K. (1986). Longitudinal data analysis for discrete and continuous outcomes. Biometrics, 42(1):121–130.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Bembom, O., Petersen, M.L., van der Laan, M.J. (2007). Identifying Important Explanatory Variables for Time-Varying Outcomes. In: Dubitzky, W., Granzow, M., Berrar, D. (eds) Fundamentals of Data Mining in Genomics and Proteomics. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-47509-7_11

Download citation

Publish with us

Policies and ethics