Abstract
A new method to test nonlinear dependence between two continuous variables (X and Y) is proposed. This is achieved by using continuous analysis of variance (CANOVA). The software is available at https://sourceforge.net/projects/canova. First, a neighborhood for each data point related to its X value was defined. Then, the variance of the Y value within the neighborhood was calculated. Last, permutations to evaluate the significance of the observed values within the neighborhood variance were conducted. To examine the strength of CANOVA compared to six other methods, extensive simulations were completed to examine the false-positive rates and statistical power. Both simulation and real datasets (kidney cancer RNA-seq data) were used. From these analyses, it was concluded that CANOVA is efficient as a method in testing nonlinear correlation and has several advantages for real data application.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aitken AC. Statistical mathematics. Edinburgh: Oliver and Boyd; 1942.
Albanese D, Filosi M, Visintainer R, Riccadonna S, Jurman G, Furlanello C. Minerva and minepy: a C engine for the MINE suite and its R, Python and MATLAB wrappers. Bioinformatics. 2013;29(3):407–8.
Altman NS. An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat. 1992;46(3):175–85.
Burke DS, Brundage JF, Redfield RR, Damato JJ, Schable CA, Putman P, et al. Measurement of the false positive rate in a screening program for human immunodeficiency virus infections. N Engl J Med. 1988;319(15):961–4.
Cleveland WS, Devlin SJ, Grosse E. Regression by local fitting–methods, properties, and computational algorithms. J Econ. 1988;37(1):87–114.
Cohen J. Statistical power analysis for the behavioral sciences. Hillsdale: L Erlbaum Associates; 1988.
Croxton FE, Cowden DJ. Applied general statistics. New Jersey: Prentice-Hall Inc.; 1939.
Devlin SJ, Gnanadesikan R, Kettenring JR. Robust estimation and outlier detection with correlation-coefficients. Biometrika. 1975;62(3):531–45.
Dieter MZ, Freshwater SL, Miller ML, Shertzer HG, Dalton TP, Nebert DW. Pharmacological rescue of the 14CoS/14CoS mouse: hepatocyte apoptosis is likely caused by endogenous oxidative stress. Free Radic Biol Med. 2003;35(4):351–67.
Dietrich CF. Uncertainty, calibration and probability: the statistics of scientific and industrial measurement. Boca Raton: CRC Press; 1991.
Galton F. Typical laws of heredity. 1877. 5.
Galton F. Regression towards mediocrity in hereditary stature. J Anthropol Inst Great Brit Ireland. 1886;15:246–63.
Good P. Permutation tests. New York: Springer; 2000.
Gretton A, Bousquet O, Smola A, Schölkopf B. Measuring statistical dependence with Hilbert-Schmidt norms. In: Algorithmic learning theory. Heidelberg: Springer; 2005. p. 63–77.
Grosse L, Campeau AS, Caron S, Morin FA, Meunier K, Trottier J, et al. Enantiomer selective glucuronidation of the non-steroidal pure anti-androgen bicalutamide by human liver and kidney: role of the human UDP-glucuronosyltransferase (UGT)1A9 enzyme. Basic Clin Pharmacol Toxicol. 2013;113(2):92–102.
Ha SA, Shin SM, Namkoong H, Lee HJ, Cho GW, Hur SY, et al. Cancer-associated expression of minichromosome maintenance 3 gene in several human cancers and its involvement in tumorigenesis. Clin Cancer Res. 2004;10(24):8386–95.
Heller R, Heller Y, Gorfine M. A consistent multivariate test of association based on ranks of distances. Biometrika 2012:ass070.
Hmisc: Harrell Miscellaneous. http://CRAN.R-project.org/package=Hmisc.
Horn PS. Introduction to robust estimation and hypothesis testing. Technometrics. 1998;40(1):77–8.
Huber P. Robust statistics. In: Lovric M, editor. International encyclopedia of statistical science. Berlin/Heidelberg: Springer; 2011. p. 1248–51.
Jiang J, Lin N, Guo S, Chen J, Xiong M. Methods for joint imaging and RNA-seq data analysis. arXiv preprint. 2014;arXiv:14093899.
Kendall MG. A new measure of rank correlation. Biometrika. 1938;30:81–93.
Kinney JB, Atwal GS. Equitability, mutual information, and the maximal information coefficient. Proc Natl Acad Sci U S A. 2014;111(9):3354–9.
Kirikoshi H, Katoh M. Molecular cloning and characterization of human GIPC2, a novel gene homologous to human GIPC1 and Xenopus Kermit. Int J Oncol. 2002;20(3):571–6.
Kosorok MR. On Brownian distance covariance and high dimensional data. Ann Appl Stat. 2009;3(4):1266–9.
Li B, Reed JC, Kim HR, HJ C. Proteomic profiling of differentially expressed proteins from Bax inhibitor-1 knockout and wild type mice. Mol Cells. 2012;34(1):15–23.
Lockyer N. Nature: Macmillan Journals Limited. 1885.
Murrell B, Murrell D, Murrell H. R2-equitability is satisfiable. Proc Natl Acad Sci. 2014;111(21):E2160.
Myers JL, Well AD, Lorch RF Jr. Research design and statistical analysis. New York: Routledge; 2010.
Natrajan R, Little SE, Reis-Filho JS, Hing L, Messahel B, Grundy PE, et al. Amplification and overexpression of CACNA1E correlates with relapse in favorable histology Wilms’ tumors. Clin Cancer Res. 2006;12(24):7284–93.
Pearson K. Note on regression and inheritance in the case of two parents. Proc R Soc Lond. 1895;58(347–352):240–2.
Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, et al. Detecting novel associations in large data sets. Science. 2011;334(6062):1518–24.
Reshef D, Reshef Y, Mitzenmacher M, Sabeti P. Equitability analysis of the maximal information coefficient, with comparisons. arXiv preprint. 2013;arXiv:13016314.
Reshef DN, Reshef YA, Mitzenmacher M, Sabeti PC. Cleaning up the record on the maximal information coefficient and equitability. Proc Natl Acad Sci. 2014;111(33):E3362–3.
Scheffe H. The analysis of variance, vol. 72. New York: Wiley; 1999.
Stigler SM. Francis Galton’s account of the invention of correlation. Stat Sci. 1989;4:73–9.
Stroustrup B. The C++ programming language: Pearson Education India. 1995.
Székely GJ, Rizzo ML. Energy statistics: a class of statistics based on distances. J Stat Plan Inference. 2013;143(8):1249–72.
Székely GJ, Rizzo ML, Bakirov NK. Measuring and testing dependence by correlation of distances. Ann Stat. 2007;35(6):2769–94.
Tanaka Y, Hirata H, Chen Z, Kikuno N, Kawamoto K, Majid S, et al. Polymorphisms of catechol-O-methyltransferase in men with renal cell cancer. Cancer Epidemiol Biomark Prev. 2007;16(1):92–7.
The Cancer Genome Atlas Research Network. Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature. 2013;499(7456):43–9.
Tierney L, Rossini A, Li N. Snow: a parallel computing framework for the R system. Int J Parallel Prog. 2009;37(1):78–90.
Wang Y, Li Y, Cao H, Xiong M, Shugart YY, Jin L. Efficient test for nonlinear dependence of two continuous variables. BMC Bioinformatics. 2015;16(1). https://doi.org/10.1186/s12859-015-0697-7.
Wilding GE, Mudholkar GS. Empirical approximations for Hoeffding’s test of bivariate independence using two Weibull extensions. Stat Meth. 2008;5(2):160–70.
Zhang T, Niu X, Liao L, Cho EA, Yang H. The contributions of HIF-target genes to tumor growth in RCC. PLoS One. 2013;8(11):e80544.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Ritter, M., Li, Y., Wang, Y., Yao, Y., Jin, L. (2018). Efficient Test for Nonlinear Dependence of Two Continuous Variables. In: Yao, Y. (eds) Applied Computational Genomics. Translational Bioinformatics, vol 13. Springer, Singapore. https://doi.org/10.1007/978-981-13-1071-3_8
Download citation
DOI: https://doi.org/10.1007/978-981-13-1071-3_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1070-6
Online ISBN: 978-981-13-1071-3
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)