Nonparametric Methods for Molecular Biology

  • Knut M. Wittkowski
  • Tingting Song
Part of the Methods in Molecular Biology book series (MIMB, volume 620)


In 2003, the completion of the Human Genome Project (1) together with advances in computational resources (2) were expected to launch an era where the genetic and genomic contributions to many common diseases would be found. In the years following, however, researchers became increasingly frustrated as most reported ‘findings’ could not be replicated in independent studies (3). To improve the signal/noise ratio, it was suggested to increase the number of cases to be included to tens of thousands (4), a requirement that would dramatically restrict the scope of personalized medicine. Similarly, there was little success in elucidating the gene–gene interactions involved in complex diseases or even in developing criteria for assessing their phenotypes. As a partial solution to these enigmata, we here introduce a class of statistical methods as the ‘missing link’ between advances in genetics and informatics. As a first step, we provide a unifying view of a plethora of nonparametric tests developed mainly in the 1940s, all of which can be expressed as u-statistics. Then, we will extend this approach to reflect categorical and ordinal relationships between variables, resulting in a flexible and powerful approach to deal with the impact of (1) multiallelic genetic loci, (2) poly-locus genetic regions, and (3) oligo-genetic and oligo-genomic collaborative interactions on complex phenotypes.

Key words

Genome-wide Association Study (GWAS) Family-based Association Test (FBAT) High-Density Oligo-Nucleotide Assay (HDONA) coregulation collaboration multiallelic multilocus multivariate gene–gene interaction epistasis personalized medicine 



The work was supported in part by Grant No. UL1RR024143 from the U.S. National Center for Research Resources (NCRR). Of the many colleagues who have contributed to this chapter through discussions and suggestions, I would like to thank, in particular, Jose F. Morales, Ephraim Sehayek, Sreeram Ramagopalan, and Martina Durner for their input on the biological background, Sreeram Ramagopalan, Bill Raynor, and Norman Cliff for their helpful comments, an anonymous reviewer for an inspiring discussion, and Daniel Eckardt for help with Latin grammar.


  1. 1.
    Collins, F. S., Green, E. D., Guttmacher, A. E., and Guyer, M. S. (2003) A vision for the future of genomics research, Nature 422, 835–847.PubMedCrossRefGoogle Scholar
  2. 2.
    Butler, D. (2003) The Grid: tomorrow’s computing today, Nature 422, 799–800.PubMedCrossRefGoogle Scholar
  3. 3.
    Pearson, T. A., and Manolio, T. A. (2008) How to interpret a genome-wide association study, JAMA 299, 1335–1344.PubMedCrossRefGoogle Scholar
  4. 4.
    Psychiatric, GWAS Consortium Coardinating Committee (2009) Genomewide association studies: history, rationale, and prospects for psychiatric disorders, Am J Psychiatry 166, 540–556.Google Scholar
  5. 5.
    Scheffé, H. (1959) The Analysis of Variance, Wiley, New York, NY.Google Scholar
  6. 6.
    Arbuthnot, J. (1710) An argument for divine providence taken from the constant regularity observ’d in the births of both sexes, Philos Trans R Soc London 27, 186–190.Google Scholar
  7. 7.
    Fisher, R. A. (1935) The Design of Experiments, Oliver & Boyd, Edinburgh.Google Scholar
  8. 8.
    Cliff, N. (1996) Answering ordinal questions with ordinal data using ordinal statistics, Multivariate Behav Res 31,; 331–350.CrossRefGoogle Scholar
  9. 9.
    Cliff, N. (1996) Ordinal Methods for Behavioral Data Analysis, Lawrence Erlbaum, Mahwah, NJ.Google Scholar
  10. 10.
    Wilcoxon, F. (1954) Individual comparisons by ranking methods, Biometrics 1, 80–83.Google Scholar
  11. 11.
    Mann, H. B., and Whitney, D. R. (1947) On a test of whether one of two random variables is stochastically larger than the other, Ann Math Stat 18, 50–60.CrossRefGoogle Scholar
  12. 12.
    Kruskal, W. H., and Wallis, W. A. (1952) Use of ranks in one-criterion variance analysis, J Am Stat Assoc 47, 583–631.CrossRefGoogle Scholar
  13. 13.
    Lewis, C. T., and Short, C. (1879) A Latin Dictionnairy, Clarendon, Oxford.Google Scholar
  14. 14.
    Georges, K. E. (1918) Ausführliches lateinisch-deutsches Handwörterbuch, Hahn, Hannover.Google Scholar
  15. 15.
    Tusher, V. G., Tibshirani, R., and Chu, G. (2001) Significance analysis of microarrays applied to the ionizing radiation response (vol 98, pg 5116), Proc Natl Acad Sci USA 98, 10515–10515.Google Scholar
  16. 16.
    van de Wiel, M. A. (2004) Significance analysis of microarrays using rank scores, Kwantitatieve Methoden 71, 25–37.Google Scholar
  17. 17.
    Wang, Z., Gerstein, M., and Snyder, M. (2009) RNA-Seq: a revolutionary tool for transcriptomics, Nat Rev Genet 10,; 57–63.PubMedCrossRefGoogle Scholar
  18. 18.
    McNemar, Q. (1947) Note on the sampling error of the differences between correlated proportions or percentages, Psychometrica 12, 153–157.CrossRefGoogle Scholar
  19. 19.
    Gauss, C. F. (1823) Theoria combinationis observationum erroribus minimis obnoxiae, Dieterich, Goettingen.Google Scholar
  20. 20.
    Coakley, C. W., and Heise, M. A. (1996) Versions of the sign test in the presence of ties, Biometrics 52, 1242–1251.CrossRefGoogle Scholar
  21. 21.
    Dixon, W. J., and Mood, A. M. (1946) The statistical sign test, J Am Stat Assoc 41,; 557–566.PubMedCrossRefGoogle Scholar
  22. 22.
    Dixon, W. J., and Massey, F. J. J. (1951) An Introduction to Statistical Analysis, McGraw-Hill, New York.Google Scholar
  23. 23.
    Rayner, J. C. W., and Best, D. J. (1999) Modelling ties in the sign test, Biometrics 55, 663–665.PubMedCrossRefGoogle Scholar
  24. 24.
    Rao, P. V., and Kupper, L. L. (1967) Ties in paired-comparison experiments: a generalization of the Bradley–Terry model, J Am Stat Assoc 62, 194–204.CrossRefGoogle Scholar
  25. 25.
    David, H. A. (1988) The Method of Paired Comparisons, 2nd ed., Griffin, London.Google Scholar
  26. 26.
    Stern, H. A. L. (1990) A continuum of paired comparisons models, Biometrika 77, 265–273.CrossRefGoogle Scholar
  27. 27.
    Yan, T., Yang, Y. N., Cheng, X., DeAngelis, M. M., Hoh, J., and Zhang, H. (2009) Genotypic Association Analysis Using Discordant-Relative-Pairs, Ann Hum Genet 73, 84–94.PubMedCrossRefGoogle Scholar
  28. 28.
    Spielman, R. S., McGinnis, R. E., and Ewens, W. J. (1993) Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM), Am J Hum Genet 52,; 506–516.PubMedGoogle Scholar
  29. 29.
    Wittkowski, K. M. (1998) Versions of the sign test in the presence of ties, Biometrics 54, 789–791.CrossRefGoogle Scholar
  30. 30.
    Wittkowski, K. M. (1989) An asymptotic UMP sign test for discretized data, Statistician 38, 93–96.CrossRefGoogle Scholar
  31. 31.
    Wittkowski, K. M., and Liu, X. (2002) A statistically valid alternative to the TDT, Hum Hered 54, 157–164.PubMedCrossRefGoogle Scholar
  32. 32.
    Sasieni, P. D. (1997) From genotypes to genes: doubling the sample size, Biometrics 53, 1253–1261.PubMedCrossRefGoogle Scholar
  33. 33.
    Wittkowski, K. M. (1988) Friedman-type statistics and consistent multiple comparisons for unbalanced designs, J Am Stat Assoc 83, 1163–1170.CrossRefGoogle Scholar
  34. 34.
    Student. (1908) On the probable error of a mean, Biometrika 6, 1–25.Google Scholar
  35. 35.
    Ramagopalan, S. V., McMahon, R., Dyment, D. A., Sadovnick, A. D., Ebers, G. C., and Wittkowski, K. M. (2009) An extension to a statistical approach for family based association studies provides insights into genetic risk factors for multiple sclerosis in the HLA-DRB1 gene, BMC Med Genetics; 10, 10.Google Scholar
  36. 36.
    Hafler, D. A., Compston, A., Sawcer, S., Lander, E. S., Daly, M. J., De Jager, P. L., de Bakker, P. I. W., Gabriel, S. B., Mirel, D. B., Ivinson, A. J., Pericak-Vance, M. A., Gregory, S. G., Rioux, J. D., McCauley, J. L., Haines, J. L., Barcellos, L. F., Cree, B., Oksenberg, J. R., and Hauser, S. L. (2007) Risk alleles for multiple sclerosis identified by a genomewide study, N Engl J Med 357,; 851–862.PubMedCrossRefGoogle Scholar
  37. 37.
    Barcellos, L. F., Sawcer, S., Ramsay, P. P., Baranzini, S. E., Thomson, G., Briggs, F., Cree, B. C., Begovich, A. B., Villoslada, P., Montalban, X., Uccelli, A., Savettieri, G., Lincoln, R. R., DeLoa, C., Haines, J. L., Pericak-Vance, M. A., Compston, A., Hauser, S. L., and Oksenberg, J. R. (2006) Heterogeneity at the HLA-DRB1 locus and risk for multiple sclerosis, Hum Mol Genet 15, 2813–2824.PubMedCrossRefGoogle Scholar
  38. 38.
    Ramagopalan, S., and Ebers, G. (2009) Multiple sclerosis: major histocompatibility complexity and antigen presentation, Genome Med 1, 105.CrossRefGoogle Scholar
  39. 39.
    Suárez-Fariñas, M., Haider, A., and Wittkowski, K. M. (2005) “Harshlighting” small blemishes on microarrays, BMC Bioinformatics 6, 65.PubMedCrossRefGoogle Scholar
  40. 40.
    Suarez-Farinas, M., Pellegrino, M., Wittkowski, K. M., and Magnasco, M. O. (2005) Harshlight: a “corrective make-up” program for microarray chips, BMC Bioinformatics 6, 294.PubMedCrossRefGoogle Scholar
  41. 41.
    Arteaga-Salas, J. M., Harrison, A. P., and Upton, G. J. G. (2008) Reducing spatial flaws in oligonucleotide arrays by using neighborhood information, Stat Appl Genet Mol Biol 7, 19.Google Scholar
  42. 42.
    Arteaga-Salas, J. M., Zuzan, H., Langdon, W. B., Upton, G. J. G., and Harrison, A. P. (2008) An overview of image-processing methods for Affymetrix GeneChips, Brief Bioinform 9, 25–33.PubMedCrossRefGoogle Scholar
  43. 43.
    Cairns, J. M., Dunning, M. J., Ritchie, M. E., Russell, R., and Lynch, A. G. (2008) BASH: a tool for managing BeadArray spatial artefacts, Bioinformatics 24,; 2921–2922.PubMedCrossRefGoogle Scholar
  44. 44.
    Deuchler, G. (1914) Über die Methoden der Korrelationsrechnung in der Pädagogik und Psychologie, Z pädagog Psychol 15, 114–131, 145–159, 229–242.Google Scholar
  45. 45.
    Morales, J. F., Song, T., Auerbach, A. D., and Wittkowski, K. M. (2008) Phenotyping genetic diseases using an extension of μ-scores for multivariate data, Stat Appl Genet Mol Biol 7, 19.Google Scholar
  46. 46.
    Kehoe, J. F., and Cliff, N. (1975) Interord: a computer-interactive Fortran iv program for developing simple orders, Educ Psychol Meas 35, 675–678.CrossRefGoogle Scholar
  47. 47.
    Kruskal, W. H. (1957) Historical notes on the Wilcoxon unpaired two-sample test, J Am Stat Assoc 52, 356–360.CrossRefGoogle Scholar
  48. 48.
    Friedman, M. (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance, J Am Stat Assoc 32, 675–701.CrossRefGoogle Scholar
  49. 49.
    Iain, M., and Urken, A. B. (1995) On elections by ballot, in Classics of Social Choice (Iain, M., and Urken, A. B., Eds.),; pp. 83–89, University of Michigan Press, Ann Arbor, MI.Google Scholar
  50. 50.
    Hägerle, G., and Puckelsheim, F. (2001) Llull’s writings on electorial systems, Stud Lulliana 41, 3–38.Google Scholar
  51. 51.
    Benard, A., and Van Elteren, P. H. (1953) A generalization of the method of m rankings, Indagationes Math 15, 358–369.Google Scholar
  52. 52.
    van Elteren, P., and Noether, G. E. (1959) The asymptotic efficiency of the chi_r^2-test for a balanced incomplete block design, Biometrika 46, 475–477.CrossRefGoogle Scholar
  53. 53.
    Durbin, J. (1951) Incomplete blocks in ranking experiments, Br J Psychol 4, 85–90.Google Scholar
  54. 54.
    Bradley, R. A., and Milton, E. T. (1952) Rank analysis of incomplete block designs: I. The method of Paired comparisons, Biometrika 39, 324–345.Google Scholar
  55. 55.
    Prentice, M. J. (1979) On the problem of m incomplete rankings, Biometrika 66,; 167–170.CrossRefGoogle Scholar
  56. 56.
    Alvo, M., and Cabilio, P. (2005) General scores statistics on ranks in the analysis of unbalanced designs, Can J Stat 33,; 115–129.CrossRefGoogle Scholar
  57. 57.
    Gao, X., and Alvo, M. (2005) A unified nonparametric approach for unbalanced factorial designs, J Am Stat Assoc 100, 926–941.CrossRefGoogle Scholar
  58. 58.
    Lam, F. C., and Longnecker, M. T. (1983) A modified Wilcoxon rank sum test for paired data, Biometrika 70, 510–513.CrossRefGoogle Scholar
  59. 59.
    Cronbach, L. J., and Meehl, P. E. (1955) Construct validity in psychological tests, Psychol Bull 52, 281–302.PubMedCrossRefGoogle Scholar
  60. 60.
    Popper, K. R. (1937) Logik der Forschung, Julius Springer, Wien.Google Scholar
  61. 61.
    Delbecq, A. (1975) Group techniques for program planning, Scott Foresman, Glenview, IL .Google Scholar
  62. 62.
    Wittkowski, K. M., Song, T., Anderson, K., and Daniels, J. E. (2008) U-scores for multivariate data in sports, J Quant Anal Sports 4, 7.Google Scholar
  63. 63.
    Freimer, N., and Sabatti, C. (2003) The human phenome project, Nat Genet 34,; 15–21.PubMedCrossRefGoogle Scholar
  64. 64.
    Wittkowski, K. M. (1980) Ein nichtparametrischer Test im Stufenblockplan [A nonparametric test for the step-down design], Institut für Medizinische Statistik, Georg-August-Universität,; Göttingen, D.Google Scholar
  65. 65.
    Wittkowski, K. M. (1984) Semiquantitative Merkmale in der nichtparametrischen Statistik, in Der beitrag der informationsverarbeitung zum fortschritt der medizin (Köhler, C. O., Wagner, E., and Tautu, P., Eds.), pp. 100–105, Springer, Berlin, D.Google Scholar
  66. 66.
    Wittkowski, K. M. (1988) Small sample properties of rank tests for incomplete unbalanced designs, Biom J 30,; 799–808.CrossRefGoogle Scholar
  67. 67.
    Wittkowski, K. M. (1992) An extension to Wittkowski, J Am Stat Assoc 87, 258.CrossRefGoogle Scholar
  68. 68.
    Einsele, H., Ehninger, G., Hebart, H., Wittkowski, K. M., Schuler, U., Jahn, G., Mackes, P., Herter, M., Klingebiel, T., Löffler, J., et al. (1995) Polymerase chain reaction monitoring reduces the incidence of cytomegalovirus disease and the duration and side effects of antiviral therapy after bone marrow transplantation, Blood 86, 2815–2820.PubMedGoogle Scholar
  69. 69.
    Talaat, M., Wittkowski, K. M., Husein, M. H., and Barakat, R. (1998) A new procedure to access individual risk of exposure to cercariae from multivariate questionnaire data, in Reproductive Health and Infectious Diseases in the Middle East (Barlow, R., and Brown, J. W., Eds.), pp. 167–174, Ashgate, Aldershot, UK.Google Scholar
  70. 70.
    Susser, E., Desvarieux, M., and Wittkowski, K. M. (1998) Reporting sexual risk behavior for HIV: a practical risk index and a method for improving risk indices, Am J Public Health 88, 671–674.PubMedCrossRefGoogle Scholar
  71. 71.
    Wittkowski, K. M., Susser, E., and Dietz, K. (1998) The protective effect of condoms and nonoxynol-9 against HIV infection, Am J Public Health 88, 590–596, 972.PubMedCrossRefGoogle Scholar
  72. 72.
    Banchereau, J., Palucka, A. K., Dhodapkar, M., Kurkeholder, S., Taquet, N., Rolland, A., Taquet, S., Coquery, S., Wittkowski, K. M., Bhardwj, N., Pineiro, L., Steinman, R., and Fay, J. (2001) Immune and clinical responses after vaccination of patients with metastatic melanoma with CD34+ hematopoietic progenitor-derived dendritic cells, Cancer Res 61, 6451–6458.PubMedGoogle Scholar
  73. 73.
    Hoeffding, W. (1948) A class of statistics with asymptotically normal distribution, Ann Math Stat 19, 293–325.CrossRefGoogle Scholar
  74. 74.
    Wittkowski, K. M. (2003) Novel methods for multivariate ordinal data applied to genetic diplotypes, genomic pathways, risk profiles, and pattern similarity, Comput Sci Stat 35, 626–646.Google Scholar
  75. 75.
    Wittkowski, K. M., and Liu, X. (2004) Beyond the TDT: rejoinder to Ewens and Spielman, Hum Hered 58, 60–61.CrossRefGoogle Scholar
  76. 76.
    Wittkowski, K. M., Lee, E., Nussbaum, R., Chamian, F. N., and Krueger, J. G. (2004) Combining several ordinal measures in clinical studies, Stat Med 23, 1579–1592.PubMedCrossRefGoogle Scholar
  77. 77.
    Gehan, E. A. (1965) A generalised two-sample Wilcoxon test for doubly censored samples, Biometrika 52, 650–653.PubMedGoogle Scholar
  78. 78.
    Gehan, E. A. (1965) A generalised Wilcoxon test for comparing arbitrarily singly censored samples, Biometrika 52, 203–223.PubMedGoogle Scholar
  79. 79.
    Schemper, M. (1983) A nonparametric; k-sample test for data defined by intervals, Stat Neerl 37, 69–71.CrossRefGoogle Scholar
  80. 80.
    Lehmann, E. L. (1951) Consistency and unbiasedness of certain nonparametric tests, Ann Math Stat 22, 165–179.CrossRefGoogle Scholar
  81. 81.
    Hoeffding, W. (1994) The Collected Works of Wassily Hoeffding, Springer, New York.Google Scholar
  82. 82.
    Rosenbaum, P. G. (1994) Coherence in observationsl studies, Biometrics 50,; 368–374.PubMedCrossRefGoogle Scholar
  83. 83.
    Song, T., Coffran, C., and Wittkowski, K. M. (2007) Screening for gene expression profiles and epistasis between diplotypes with S-Plus on a grid, Stat Comput Graph 18,; 20–25.Google Scholar
  84. 84.
    Cherchye, L., and Vermeulen, F. (2006) Robust rankings of multidimensional performances: an application to Tour de France racing cyclists, J Sports Econ 7, 359–373.CrossRefGoogle Scholar
  85. 85.
    Quaia, E., D’Onofrio, M., Cabassa, P., Vecchiato, F., Caffarri, S., Pittiani, F., Wittkowski, K. M., and Cova, M. A. (2007) Diagnostic value of hepatocellular nodule vascularity after microbubble injection for characterizing malignancy in patients with cirrhosis, Am J Roentgenol 189, 1474–1483.Google Scholar
  86. 86.
    Ramamoorthi, R. V., Rossano, M. G., Paneth, N., Gardiner, J. C., Diamond, M. P., Puscheck, E., Daly, D. C., Potter, R. C., and Wirth, J. J. (2008) An application of multivariate ranks to assess effects from combining factors: Metal exposures and semen analysis outcomes, Stat Med 27, 3503–3514.PubMedCrossRefGoogle Scholar
  87. 87.
    Shockley, W., Bardeen, J., and Brattain, W. H. (1948) The electronic theory of the transistor, Science 108, 678–679.Google Scholar
  88. 88.
    Haberle, L., Pfahlberg, A., and Gefeller, O. (2009) Assessment of multiple ordinal endpoints, Biom J 51, 217–226.PubMedCrossRefGoogle Scholar
  89. 89.
    O’Brien, P. C. (1984) Procedures for comparing samples with multiple endpoints, Biometrics 40, 1079–1087.PubMedCrossRefGoogle Scholar
  90. 90.
    Diana, M., Song, T., and Wittkowski, K. (2009) Studying travel-related individual assessments and desires by combining hierarchically structured ordinal variables, Transp 36, 187–206.CrossRefGoogle Scholar
  91. 91.
    Kendall, M. G. (1938) A new measure of rank correlation, Biometrika 30,; 81–93.Google Scholar
  92. 92.
    Jonckheere, A. R. (1954) A distribution-free k-sample test against ordered alternatives, Biometrika 41, 133–145.Google Scholar
  93. 93.
    Terpstra, T. J. (1952) The asymptotic normality and consistency of Kendall’s test against trend when ties are present in one ranking, Indagationes Math 14, 327–333.Google Scholar
  94. 94.
    Spangler, R., Wittkowski, K. M., Goddard, N. L., Avena, N. M., Hoebel, B. G., and Leibowitz, S. F. (2004) Opiate-like effects of sugar on gene expression in reward areas of the rat brain, Mol Brain Res 124,; 134–142.PubMedCrossRefGoogle Scholar
  95. 95.
    Morales, J. F., Song, T., Wittkowski, K. M., and Auerbach, A. D. (submitted) A statistical systems biology approach to FANCC gene expression suggests drug targets for Fanconi anemia.Google Scholar
  96. 96.
    Armitage, P. (1955) Tests for linear trends in proportions and frequencies, Biometrics 11, 375–386.CrossRefGoogle Scholar
  97. 97.
    Janka, G. E., and Schneider, E. M. (2004) Modern management of children with haemophagocytic lymphohistiocytosis, Br J Haematol 124, 4–14.PubMedCrossRefGoogle Scholar
  98. 98.
    Seybold, M. P., Wittkowski, K. M., and Schneider, E. M. (2008) Biomarker; analysis using a non-parametric selection procedure to discriminate the phagocytic syndromes HLH (hemophagocytic lymphohistiocytosis) and mas (macrophage activation syndrome), Shock 29, 90.Google Scholar
  99. 99.
    Kraft, P., and Hunter, D. J. (2009) Genetic risk prediction – Are we there yet?, N Engl J Med 360, 1701–1703.PubMedCrossRefGoogle Scholar
  100. 100.
    Wittkowski, K. M. (1990) Statistical knowledge-based systems – critical remarks and requirements for approval, Comput Methods Programs Biomed 33, 255–259.PubMedCrossRefGoogle Scholar
  101. 101.
    Akritas, M. G., Arnold, S. F., and Brunner, E. (1997) Nonparametric hypotheses and rank statistics for unbalanced factorial designs. Part I, J Am Stat Assoc 92,; 258–265.CrossRefGoogle Scholar
  102. 102.
    Brunner, E., Munzel, U., and Puri, M. L. (1999) Rank-score tests in factorial designs with repeated measures, J Multivar Anal 70, 286–317.CrossRefGoogle Scholar

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Knut M. Wittkowski
    • 1
  • Tingting Song
    • 1
  1. 1.Center for Clinical and Translational ScienceThe Rockefeller UniversityNew YorkUSA

Personalised recommendations