Identifying Important Explanatory Variables for Time-Varying Outcomes

Bembom, Oliver; Petersen, Maya L.; van der Laan, Mark J.

doi:10.1007/978-0-387-47509-7_11

Oliver Bembom³,
Maya L. Petersen³ &
Mark J. van der Laan³

1960 Accesses

Abstract

Many applications in modern biology measure a large number of genomic or proteomic covariates and are interested in assessing the impact of each of these covariates on a particular outcome of interest. In a study which follows a cohort of HIV-positive patients over time, for example, a researcher may genotype the virus infecting each patient to ascertain the presence or absence of a large number of mutations, in the hope of identifying mutations that affect how a patient’s plasma HIV RNA level (viral load) responds to a new drug regimen. Along with an estimate of the impact of each mutation on the time course of viral load, the researcher would generally like to have a measure of the statistical significance of these estimates in order to identify those mutations that are most likely to be genuinely related to the outcome. Such information could then be used to inform the decision of which drugs should be included in the regimen of a patient with a particular pattern of mutations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Benjamini, Y. and Hochberg, T. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Stat. Soc., Series B, 85:289–300.
Google Scholar
Bland, J.M. and Altman, D.G. (1995). Multiple significance tests: the bonferroni method. Brit. Med. J., 310:170.
PubMed CAS Google Scholar
Boucher, C.A.B., Cammack, P., Schipper, R., Rouse, P.L., and Cameron, J.M. (1993). High-level resistance to (−) enantiomeric 2′deoxy-3′thiacytidine (3tc) in vitro due to one amino acid substitution in the catalytic site of human immunodeficiency virus type 1 reverse transcriptase. Antimicrobial Agents and Chemotherapy, 37:2231–2234.
PubMed CAS Google Scholar
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2):123–140.
Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. (1984). Classification and regression trees. The Wadsworth Statistics/Probability series. Wadsworth International Group.
Google Scholar
Cleveland, W.S. (1979). Robust locally-weighted regression and smoothing scatterplots. J. Am. Stat. Assoc., 74:829–836.
Article Google Scholar
Dudoit, S. and van der Laan, M. J. (2006). Multiple Testing Procedures and Applications to Genomics. Springer. (In preparation).
Google Scholar
Kooperberg, C., Bose, S., and Stone, C.J. (1997). Polychotomous regression. J. Am. Stat. Assoc., 92:117–127.
Article Google Scholar
Lacey, S.F. and Larder, B.A. (1994). Novel mutation (v75t) in human immunodeficiency virus type 1 reverse transcriptase confers resistance to 2′-3′didehydro-2′,3′-dideoxythymidine in cell culture. Antimicrobial Agents and Chemotherapy, 38(6): 1428–1432.
PubMed CAS Google Scholar
Lehmann, E.L. and Romano, J. (2005). Testing Statistical Hypotheses. Springer, New York, 3rd edition.
Google Scholar
Liang, K. and Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73(1):13–22.
Article Google Scholar
McCullagh, P. and Nelder, J. A. (1989). Generalized linear models (2nd edition). London: Chapman & Hall.
Google Scholar
Neugebauer, R. and van der Laan, M.J. (2005). Why prefer double robust estimates in causal inference? J. Stat. Planning and Inference, 129(1–2):405–426.
Article Google Scholar
R Development Core Team (2005). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.
Google Scholar
Rhee, S., Taylor, J., Wadhera, G., Ravela, J., Ben-Hur, A., Brutlag, D., and Shafer, R.W. (2006). Genotypic predictors of human immunodeficiency virus type 1 drug resistance. (Submitted).
Google Scholar
Schurman, R., Nijhuis, M., van Leeuwen, R., Schipper, P., de Jong, D., Collis, P., Danner, S.A., Mulder, J., Loveday, C., and Christopherson, C. (1995). Rapid changes in human immunodeficiency virus type 1 rna load and appearance of drug-resistant virus populations in persons treated with lamivudine (3tc). J. Infect Dis., 171:1411–1419.
Google Scholar
Shafer, R.W. (2002). Genotypic testing for human immunodeficiency virus type 1 drug restistance. Clin. Microbiol. Rev., 15(2):247–277.
Article PubMed CAS Google Scholar
Sinisi, S.E. and van der Laan, M.J. (2004). Deletion/substitution/addition algorithm in learning with applications in genomics. Stat. Appl. Gen. Mol. Biol., 3(1).
Google Scholar
Tisdale, M., Kemp, S.D., Parry, N.R., and Larder, B.A. (1993). Rapid in vitro selection of human immunodeficiency virus 1 type 1 resistant to 3′-thyiacytidine inhibitors due to a mutation in the ymdd region of reverse transcriptase. Proc. Natl. Acad. Sc. USA, 90:5653–5656.
Article CAS Google Scholar
van der Laan, M.J. (2006a). Causal effects for intention to treat and realistic individualized treatment rules. Technical Report 203, Division of Biostatistics, University of California, Berkeley.
Google Scholar
van der Laan, M.J. (2006b). Statistical inference for variable importance. Intl. J. Biostat., 2(1).
Google Scholar
Westfall, P.H. and Young, S.S. (1993). Resampling-based multiple testing: Examples and methods for p-value adjustment. Wiley, New York.
Google Scholar
Zeger, S.L. and Liang, K. (1986). Longitudinal data analysis for discrete and continuous outcomes. Biometrics, 42(1):121–130.
Article PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

Division of Biostatistics, University of California, Berkeley, USA
Oliver Bembom, Maya L. Petersen & Mark J. van der Laan

Authors

Oliver Bembom
View author publications
You can also search for this author in PubMed Google Scholar
Maya L. Petersen
View author publications
You can also search for this author in PubMed Google Scholar
Mark J. van der Laan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Ulster, Coleraine, Northern Ireland
Werner Dubitzky & Daniel Berrar &
Quantiom Bioinformatics GmbH & Co. KG, Weingarten/Baden, Germany
Martin Granzow

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bembom, O., Petersen, M.L., van der Laan, M.J. (2007). Identifying Important Explanatory Variables for Time-Varying Outcomes. In: Dubitzky, W., Granzow, M., Berrar, D. (eds) Fundamentals of Data Mining in Genomics and Proteomics. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-47509-7_11

Download citation

DOI: https://doi.org/10.1007/978-0-387-47509-7_11
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-47508-0
Online ISBN: 978-0-387-47509-7
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics