Two Samples

  • Edgar Brunner
  • Arne C. Bathke
  • Frank Konietschke
Part of the Springer Series in Statistics book series (SSS)


This section introduces nonparametric methods for two independent samples. These describe observations on n1 individuals (subjects, experimental units) in one group, and on n2 other individuals in another group. The groups could correspond to different treatments to which the subjects are randomly assigned, or they could refer to different sub-populations (e.g., male vs. female). Mathematically, this situation is modeled by each of the two samples consisting of ni independent and identically distributed random variables \(X_{i1}, \ldots , X_{in_i}\), i = 1, 2, and by assuming independence across groups. Using the unified nonparametric approach described in this section, it is not necessary to consider the cases of continuous and discrete data separately. Thus, a correction for ties is not necessary—a technique that often had to be applied in the classical framework of nonparametric statistics. The methods described here are valid for data with or without ties, specifically for continuous, quantitative data, count data, ordinal data, and even binary (dichotomous) data. Real data examples illustrate each of these cases. The corresponding data analyses are demonstrated using R and SAS. In the subsequent Chap.  4, the results presented here for two samples (a = 2) are generalized to more than two samples (a ≥ 2).


  1. Agresti A (2010) Analysis of ordinal categorical data. Wiley, New York. ISBN:978-0-470-08289-8zbMATHGoogle Scholar
  2. Agresti A (2013) Categorical data analysis, 3rd edn. Wiley, New York. ISBN:978-0-470-46363-5zbMATHGoogle Scholar
  3. Agresti A, Caffo BL (2000) Simple and effective confidence intervals for proportions and differences of proportions result from adding two successes and two failures. Am Stat 54:218–287MathSciNetzbMATHGoogle Scholar
  4. Akritas MG (1990) The rank transform method in some two-factor designs. J Am Stat Assoc 85:73–78MathSciNetzbMATHGoogle Scholar
  5. Akritas MG (1991) Limitations on the rank transform procedure: a study of repeated measures designs, Part I. J Am Stat Assoc 86:457–460zbMATHGoogle Scholar
  6. Akritas MG, Brunner E (1997) A unified approach to ranks tests in mixed models. J Stat Plann Inference 61:249–277zbMATHGoogle Scholar
  7. Atiqullah M (1962) The estimation of residual variance in quadratically balanced least-squares problems and the robustness of the F-test. Biometrika 49:83–91MathSciNetzbMATHGoogle Scholar
  8. Bauer DF (1972) Constructing confidence sets using rank statistics. J Am Stat Assoc 67:687–690zbMATHGoogle Scholar
  9. Behrens WV (1929) Ein Beitrag zur Fehlerberechnung bei wenigen Beobachtungen. Landw Jb 68:807–837Google Scholar
  10. Birnbaum ZW (1956) On a use of the Mann-Whitney statistic. Proceedings of the 3rd Berkely Symposium on Mathematical Statistics and Probability, vol 1, pp 13–17Google Scholar
  11. Blair RC, Sawilowski SS, Higgens JJ (1987) Limitations of the rank transform statistic in tests for interactions. Commun Stat Ser B 16:1133–1145Google Scholar
  12. Bottai M, Cai B, McKeown RE (2010) Logistic quantile regression for bounded outcomes. Stat Med 29:309–317MathSciNetGoogle Scholar
  13. Brunner E, Munzel U (2000) The nonparametric Behrens-Fisher problem: asymptotic theory and a small-sample approximation. Biom J 42:17–25MathSciNetzbMATHGoogle Scholar
  14. Brunner E, Munzel U (2013) Nichtparametrische Datenanalyse, 2nd edn. Springer, HeidelbergzbMATHGoogle Scholar
  15. Brunner E, Neumann N (1984) Rank tests for the 2×2 split plot design. Metrika 31:233–243MathSciNetzbMATHGoogle Scholar
  16. Brunner E, Neumann N (1986) Rank tests in 2×2 designs. Statistica Neerlandica 40:251–272MathSciNetzbMATHGoogle Scholar
  17. Brunner E, Puri ML (1996) Nonparametric methods in design and analysis of experiments. In: Ghosh S, Rao CR (eds) Handbook of Statistics, vol 13. Elsevier/North-Holland, New York/Amsterdam, pp 631–703zbMATHGoogle Scholar
  18. Brunner E, Puri ML (2001) Nonparametric methods in factorial designs. Stat Pap 42:1–52MathSciNetzbMATHGoogle Scholar
  19. Brunner E, Puri ML (2002) A class of rank-score tests in factorial designs. J Stat Plann Inference 103:331–360MathSciNetzbMATHGoogle Scholar
  20. Brunner E, Puri ML (2013a) Letter to the Editor. WIREs Comput Stat 5:486–488. Google Scholar
  21. Brunner E, Puri ML (2013b). Comments on the paper ‘Type I error and test power of different tests for testing interaction effects in factorial experiments’ by M. Mendes and S. Yigit (Statistica Neerlandica, 2013, pp. 1–26). Stat Neerl 67:390–396MathSciNetGoogle Scholar
  22. Brunner E, Domhof S, Langer F (2002) Nonparametric analysis of longitudinal data in factorial designs. Wiley, New YorkzbMATHGoogle Scholar
  23. Büning H, Trenkler G (1994) Nichtparametrische statistische Methoden, zweite Auflage. Walter de Gruyter, Berlin, New YorkzbMATHGoogle Scholar
  24. Bürkner P-C, Doebler P, Holling H (2017) Optimal design of the Wilcoxon-Mann-Whitney-test. Biom J 59:25–40MathSciNetzbMATHGoogle Scholar
  25. Campbell MJ, Julious SA, Altman DG (1995) Sample sizes for binary, ordered categorical and continuous outcomes in two group comparions. Br Med J 311:1145–1148Google Scholar
  26. Cheng KF, Chao A (1984) Confidence intervals for reliability from stress-strength relationships. IEEE Trans Reliab 33:246–249zbMATHGoogle Scholar
  27. Conover WJ (2012) The rank transformation – an easy and intuitive way to connect many nonparametric methods to their parametric counterparts for seamless teaching introductory statistics courses. WIREs Comput Stat 4:432–438Google Scholar
  28. Conover WJ, Iman RL (1976) On some alternative procedures using ranks for the analysis of experimental designs. Commun Stat Ser A 14:1349–1368Google Scholar
  29. Conover WJ, Iman RL (1981a) Rank transformations as a bridge between parametric and nonparametric statistics (with discussion). Am Stat 35:124–129zbMATHGoogle Scholar
  30. Conover WJ, Iman RL (1981b) Rank transformations as a bridge between parametric and nonparametric statistics: rejoinder. Am Stat 35:133zbMATHGoogle Scholar
  31. Deuchler G (1914) Über die Methoden der Korrelationsrechnung in der Pädagogik und Psychologie. Zeitschrift für Pädagogische Psychologie und Experimentelle Pädagogik 15:114–31, 145–59, 229–42Google Scholar
  32. Divine G, Kapke A, Havstad S, Joseph CL (2010) Exemplary data set sample size calculation for Wilcoxon-Mann-Whitney tests. Stat Med 29:108–115MathSciNetGoogle Scholar
  33. Divine GW, Norton HJ, Baron AE, Juarez-Colunga E (2017) The Wilcoxon-Mann-Whitney procedure fails as a test of medians. Am Stat. MathSciNetGoogle Scholar
  34. Fine T (1966) On the Hodges and Lehmann shift estimator in the two-sample problem. Ann Math Stat 37:1814–1818MathSciNetzbMATHGoogle Scholar
  35. Fleiss JL, Tytun A, Ury HK (1980) A simple approximation for calculating sample sizes for comparing independent proportions. Biometrics 36:343–346Google Scholar
  36. Fligner MA (1981) Comment on ‘Rank Transformations as a Bridge Between Parametric and Nonparametric Statistics’ (by W.J. Conover and R.L. Iman). Am Stat 35:131–132Google Scholar
  37. Fligner MA, Policello GE II (1981) Robust rank procedures for the Behrens-Fisher problem. J Stat Assoc 76:162–168MathSciNetGoogle Scholar
  38. Gibbons JD, Chakraborti S (2011) Nonparametric statistical inference, 5th edn. Taylor & Francis/CRC Press, Boca RatonzbMATHGoogle Scholar
  39. Govindarajulu Z (1968) Distribution-free confidence bounds for Pr{X < Y }. Ann Inst Stat Math 20:229–238zbMATHGoogle Scholar
  40. Happ M, Bathke AC, Brunner E (2018) Optimal sample size planning for the Wilcoxon-Mann-Whitney test. Stat Med 37:1–13. https://doi.org10.1002/sim.7983 MathSciNetGoogle Scholar
  41. Halperin M, Gilbert PR, Lachin JM (1987) Distribution-free confidence intervals for Pr(X 1 < X 2). Biometrics 43:71–80MathSciNetzbMATHGoogle Scholar
  42. Hamilton MA, Collings BJ (1991) Determining the appropriate sample size for nonparametric tests for location shift. Technometrics 33:327–337MathSciNetzbMATHGoogle Scholar
  43. Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29–36Google Scholar
  44. Hilton JF, Mehta CR (1993) Power and sample size calculations for exact conditional tests with ordered categorical data. Biometrics 49:609–616zbMATHGoogle Scholar
  45. Hodges JL, Lehmann EL (1963) Estimation of location based on ranks. Ann Math Stat 34:598–611zbMATHGoogle Scholar
  46. Hothorn T, Hornik K, van de Wiel MA, Zeileis A (2008) Implementing a class of permutation tests: The coin Package. J Stat Softw 28:1–23Google Scholar
  47. Hoyland A (1965) Robustness of the Hodges-Lehmann estimates for shift. Ann Math Stat 36:174–197MathSciNetzbMATHGoogle Scholar
  48. Hutmacher MM, French JL, Krishnaswami S, Menon S (2011) Estimating transformations for repeated measures modeling of continuous bounded outcome data. Stat Med 30:935–949MathSciNetGoogle Scholar
  49. Janssen A (1999) Testing nonparametric statistical functionals with applications to rank tests. J Stat Plann Inference 81:71–93MathSciNetzbMATHGoogle Scholar
  50. Janssen A (2001) Erratum: Testing nonparametric statistical functionals with applications to rank tests [J. Statist. Plann. Inference 81 (1999) 71–93]. J Stat Plann Inference 92:297Google Scholar
  51. Julious SA, Campbell MJ (1996) Letter to the Editor: sample sizes calculations for ordered categorical data. Stat Med 15:1065–1066Google Scholar
  52. Kolassa JE (1995) A comparison of size and power calculations for the Wilcoxon statistic for ordered categorical data. Stat Med 14:1577–1581Google Scholar
  53. Konietschke F, Pauly M (2012). A studentized permutation test for the nonparametric Behrens-Fisher problem in paired data. Electron J Stat 6:1358–1372MathSciNetzbMATHGoogle Scholar
  54. Kruskal WH (1952) A nonparametric test for the several sample problem. Ann Math Stat 23:525–540MathSciNetzbMATHGoogle Scholar
  55. Kruskal WH, Wallis WA (1952) The use of ranks in one-criterion variance analysis. J Am Stat Assoc 47:583–621zbMATHGoogle Scholar
  56. Lange K, Brunner E (2012) Sensitivity, specificity and ROC-curves in multiple reader diagnostic trials - a unified, nonparametric approach. Stat Methodol 9:490–500MathSciNetzbMATHGoogle Scholar
  57. Lehmann EL (1953) The power of rank tests. Ann Math Stat 24:23–43MathSciNetzbMATHGoogle Scholar
  58. Lehmann EL (1963) Nonparametric confidence intervals for a shift parameter. Ann Math Stat 34:1507–1512MathSciNetzbMATHGoogle Scholar
  59. Lehmann EL, D’Abrera HJM (2006) Nonparametrics: statistical methods based on ranks. Springer, Berlin, HeidelbergzbMATHGoogle Scholar
  60. Lesaffre E, Scheys I, Fröhlich J, Bluhmki E (1993) Calculation of power and sample size with bounded outcome scores. Stat Med 12:1063–1078Google Scholar
  61. Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 18:50–60MathSciNetzbMATHGoogle Scholar
  62. Mee R-W (1990) Confidence intervals for probabilities and tolerance regions based on a generalization of the Mann-Whitney statistic. J Am Stat Assoc 85:793–800MathSciNetGoogle Scholar
  63. Mehta CR, Patel NR, Senchaudhuri P (1988) Importance sampling for estimating exact probabilities in permutational inference. J Am Stat Assoc 83:999–1005MathSciNetGoogle Scholar
  64. Moser BK, Stevens GR (1992) Homogeneity of variance in the two-sample means test. Am Stat 46:19–21Google Scholar
  65. Neubert K, Brunner E (2007). A studentized permutation test for the nonparametric Behrens-Fisher problem. Comput Stat Data Anal 51:5192–5204zbMATHGoogle Scholar
  66. Neuhäuser M (2011) Nonparametric statistical tests: a computational approach. CRC Press, Boca RatonzbMATHGoogle Scholar
  67. Newcombe RG (2006a) Confidence intervals for an effect size measure based on the Mann-Whitney statistic. Part 1: general issues and tail-area-based methods. Stat Med 25:543–557MathSciNetGoogle Scholar
  68. Newcombe RG (2006b) Confidence intervals for an effect size measure based on the Mann-Whitney statistic. Part 2: asymptotic methods and evaluation. Stat Med 25:559–573MathSciNetGoogle Scholar
  69. Noether GE(1967) Elements of nonparametric statistics. Wiley, New YorkzbMATHGoogle Scholar
  70. Noether GE (1981) Comment on ‘Rank Transformations as a Bridge Between Parametric and Nonparametric Statistics’ (by W.J. Conover and R.L. Iman). Am Stat 35:129–130Google Scholar
  71. Noether GE (1987) Sample size determination for some common nonparametric tests. J Am Stat Assoc 85:645–647MathSciNetzbMATHGoogle Scholar
  72. O’Brien RG, Castelloe JM (2006) Exploiting the link between the Wilcoxon-Mann-Whitney test and a simple odds statistic. In: Proceedings of the 31st Annual SAS Users Group International Conference, Paper 209–31. SAS Institute Inc., CaryGoogle Scholar
  73. Pauly M, Asendorf T, Konietschke F (2016) Permutation-based inference for the AUC: a unified approach for continuous and discontinuous data. Biom J 58:1319–1337MathSciNetzbMATHGoogle Scholar
  74. Puntanen S, Styan GPH, Isotalo J (2011) Matrix tricks for linear statistical models. Springer, HeidelbergzbMATHGoogle Scholar
  75. Rabbee N, Coull BA, Mehta C (2003). Power and sample size for ordered categorical data. Stat Methods Med Res 12:73–84MathSciNetzbMATHGoogle Scholar
  76. Randles RH, Wolfe DA (1979) Introduction to the theory of nonparametric statistics. Wiley, New York. New edition: Krieger, 1991Google Scholar
  77. Randles RH, Wolfe DA (1991) Introduction to the theory of nonparametric statistics. New edition: Krieger, 1991zbMATHGoogle Scholar
  78. Rosner B, Glynn RJ (2009) Power and sample size estimation for the Wilcoxon rank sum test with application to comparisons of C statistics from alternative prediction models. Biometrics 65:188–197MathSciNetzbMATHGoogle Scholar
  79. Satterthwaite FE (1946) An approximate distribution of estimates of variance components. Biom Bull 2:110–114Google Scholar
  80. Seber GAF (2008) Matrix handbook for statisticians. Wiley, HobokenzbMATHGoogle Scholar
  81. Sen PK (1967) A note on asymptotically distribution-free confidence intervals for Pr(X < Y ) based on two independent samples. Sankhya Ser A 29:95–102MathSciNetzbMATHGoogle Scholar
  82. Shieh G, Jan S-L, Randles RH (2006) On power and sample size determinations for thw Wilcoxon-Mann-Whitney test. J Nonparametr Stat 18:33–48MathSciNetzbMATHGoogle Scholar
  83. Smith HF (1936) The problem of comparing the results of two experiments with unequal errors. J Counc Sci Ind Res 9:211–212Google Scholar
  84. Streitberg B, Röhmel J (1986) Exact distribution for permutation and rank tests: an introduction to some recently published algorithms. Stat Softw Newslett 12:10–17Google Scholar
  85. Tang Y (2011) Size and power estimation for the Wilcoxon-Mann-Whitney test for ordered categorical data. Stat Med 30:3461–3470MathSciNetGoogle Scholar
  86. Thompson GL (1991b) A note on the rank transform for interactions. Biomelrika 78:697–701MathSciNetzbMATHGoogle Scholar
  87. Vollandt R, Horn M (1997). Evaluation of Noether’s method of sample size determination for the Wilcoxon-Mann-Whitney test. Biom J 39:822–829zbMATHGoogle Scholar
  88. Wang H, Chen B, Chow S-C (2003) Sample size determination based on rank tests in clinical trials. J Biopharm Stat 13:735–751zbMATHGoogle Scholar
  89. Walter E (1962) Verteilungsunabhängige Schätzverfahren. Zeitschrift für Angewandte Mathematik und Mechanik 42:85–87zbMATHGoogle Scholar
  90. Welch BL (1937) The significance of the difference between two means when the population variances are unequal. Biometrika 29:350–362zbMATHGoogle Scholar
  91. Welch BL (1951) On the comparison of several mean values: an alternative approach. Biometrika 38:330–336MathSciNetzbMATHGoogle Scholar
  92. Whitehead J (1993) Sample size calculations for ordered categorical data. Stat Med 12:2257–2271Google Scholar
  93. Wilcox RR (2003) Applying contemporary statistical techniques. Academic, San DiegozbMATHGoogle Scholar
  94. Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics 1:80–83MathSciNetGoogle Scholar
  95. Zaremba SK (1962). A generalization of Wilcoxon’s tests. Monatshefte für Mathematik 66:359–70MathSciNetzbMATHGoogle Scholar
  96. Zhao YD, Rahardja D, Qu Y (2008). Sample size calculation for the Wilcoxon-Mann-Whitney test adjusting for ties. Stat Med 27:462–468MathSciNetGoogle Scholar
  97. Zhou W (2008) Statistical inference for P(X < Y ). Stat Med 27:257–279MathSciNetGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Edgar Brunner
    • 1
  • Arne C. Bathke
    • 2
  • Frank Konietschke
    • 3
  1. 1.Department of Medical StatisticsUniversity of G¨ottingen, University Medical CenterGöttingenGermany
  2. 2.Department of MathematicsUniversity of SalzburgSalzburgAustria
  3. 3.Institute of Biometry and Clinical EpidemiologyCharité – University Medical SchoolBerlinGermany

Personalised recommendations