# Two Samples

• Edgar Brunner
• Arne C. Bathke
• Frank Konietschke
Chapter
Part of the Springer Series in Statistics book series (SSS)

## Abstract

This section introduces nonparametric methods for two independent samples. These describe observations on n1 individuals (subjects, experimental units) in one group, and on n2 other individuals in another group. The groups could correspond to different treatments to which the subjects are randomly assigned, or they could refer to different sub-populations (e.g., male vs. female). Mathematically, this situation is modeled by each of the two samples consisting of ni independent and identically distributed random variables $$X_{i1}, \ldots , X_{in_i}$$, i = 1, 2, and by assuming independence across groups. Using the unified nonparametric approach described in this section, it is not necessary to consider the cases of continuous and discrete data separately. Thus, a correction for ties is not necessary—a technique that often had to be applied in the classical framework of nonparametric statistics. The methods described here are valid for data with or without ties, specifically for continuous, quantitative data, count data, ordinal data, and even binary (dichotomous) data. Real data examples illustrate each of these cases. The corresponding data analyses are demonstrated using R and SAS. In the subsequent Chap. , the results presented here for two samples (a = 2) are generalized to more than two samples (a ≥ 2).

## References

1. Agresti A (2010) Analysis of ordinal categorical data. Wiley, New York. ISBN:978-0-470-08289-8
2. Agresti A (2013) Categorical data analysis, 3rd edn. Wiley, New York. ISBN:978-0-470-46363-5
3. Agresti A, Caffo BL (2000) Simple and effective confidence intervals for proportions and differences of proportions result from adding two successes and two failures. Am Stat 54:218–287
4. Akritas MG (1990) The rank transform method in some two-factor designs. J Am Stat Assoc 85:73–78
5. Akritas MG (1991) Limitations on the rank transform procedure: a study of repeated measures designs, Part I. J Am Stat Assoc 86:457–460
6. Akritas MG, Brunner E (1997) A unified approach to ranks tests in mixed models. J Stat Plann Inference 61:249–277
7. Atiqullah M (1962) The estimation of residual variance in quadratically balanced least-squares problems and the robustness of the F-test. Biometrika 49:83–91
8. Bauer DF (1972) Constructing confidence sets using rank statistics. J Am Stat Assoc 67:687–690
9. Behrens WV (1929) Ein Beitrag zur Fehlerberechnung bei wenigen Beobachtungen. Landw Jb 68:807–837Google Scholar
10. Birnbaum ZW (1956) On a use of the Mann-Whitney statistic. Proceedings of the 3rd Berkely Symposium on Mathematical Statistics and Probability, vol 1, pp 13–17Google Scholar
11. Blair RC, Sawilowski SS, Higgens JJ (1987) Limitations of the rank transform statistic in tests for interactions. Commun Stat Ser B 16:1133–1145Google Scholar
12. Bottai M, Cai B, McKeown RE (2010) Logistic quantile regression for bounded outcomes. Stat Med 29:309–317
13. Brunner E, Munzel U (2000) The nonparametric Behrens-Fisher problem: asymptotic theory and a small-sample approximation. Biom J 42:17–25
14. Brunner E, Munzel U (2013) Nichtparametrische Datenanalyse, 2nd edn. Springer, Heidelberg
15. Brunner E, Neumann N (1984) Rank tests for the 2×2 split plot design. Metrika 31:233–243
16. Brunner E, Neumann N (1986) Rank tests in 2×2 designs. Statistica Neerlandica 40:251–272
17. Brunner E, Puri ML (1996) Nonparametric methods in design and analysis of experiments. In: Ghosh S, Rao CR (eds) Handbook of Statistics, vol 13. Elsevier/North-Holland, New York/Amsterdam, pp 631–703
18. Brunner E, Puri ML (2001) Nonparametric methods in factorial designs. Stat Pap 42:1–52
19. Brunner E, Puri ML (2002) A class of rank-score tests in factorial designs. J Stat Plann Inference 103:331–360
20. Brunner E, Puri ML (2013a) Letter to the Editor. WIREs Comput Stat 5:486–488. Google Scholar
21. Brunner E, Puri ML (2013b). Comments on the paper ‘Type I error and test power of different tests for testing interaction effects in factorial experiments’ by M. Mendes and S. Yigit (Statistica Neerlandica, 2013, pp. 1–26). Stat Neerl 67:390–396
22. Brunner E, Domhof S, Langer F (2002) Nonparametric analysis of longitudinal data in factorial designs. Wiley, New York
23. Büning H, Trenkler G (1994) Nichtparametrische statistische Methoden, zweite Auflage. Walter de Gruyter, Berlin, New York
24. Bürkner P-C, Doebler P, Holling H (2017) Optimal design of the Wilcoxon-Mann-Whitney-test. Biom J 59:25–40
25. Campbell MJ, Julious SA, Altman DG (1995) Sample sizes for binary, ordered categorical and continuous outcomes in two group comparions. Br Med J 311:1145–1148Google Scholar
26. Cheng KF, Chao A (1984) Confidence intervals for reliability from stress-strength relationships. IEEE Trans Reliab 33:246–249
27. Conover WJ (2012) The rank transformation – an easy and intuitive way to connect many nonparametric methods to their parametric counterparts for seamless teaching introductory statistics courses. WIREs Comput Stat 4:432–438Google Scholar
28. Conover WJ, Iman RL (1976) On some alternative procedures using ranks for the analysis of experimental designs. Commun Stat Ser A 14:1349–1368Google Scholar
29. Conover WJ, Iman RL (1981a) Rank transformations as a bridge between parametric and nonparametric statistics (with discussion). Am Stat 35:124–129
30. Conover WJ, Iman RL (1981b) Rank transformations as a bridge between parametric and nonparametric statistics: rejoinder. Am Stat 35:133
31. Deuchler G (1914) Über die Methoden der Korrelationsrechnung in der Pädagogik und Psychologie. Zeitschrift für Pädagogische Psychologie und Experimentelle Pädagogik 15:114–31, 145–59, 229–42Google Scholar
32. Divine G, Kapke A, Havstad S, Joseph CL (2010) Exemplary data set sample size calculation for Wilcoxon-Mann-Whitney tests. Stat Med 29:108–115
33. Divine GW, Norton HJ, Baron AE, Juarez-Colunga E (2017) The Wilcoxon-Mann-Whitney procedure fails as a test of medians. Am Stat. https://doi.org/10.1080/00031305.2017.1305291
34. Fine T (1966) On the Hodges and Lehmann shift estimator in the two-sample problem. Ann Math Stat 37:1814–1818
35. Fleiss JL, Tytun A, Ury HK (1980) A simple approximation for calculating sample sizes for comparing independent proportions. Biometrics 36:343–346Google Scholar
36. Fligner MA (1981) Comment on ‘Rank Transformations as a Bridge Between Parametric and Nonparametric Statistics’ (by W.J. Conover and R.L. Iman). Am Stat 35:131–132Google Scholar
37. Fligner MA, Policello GE II (1981) Robust rank procedures for the Behrens-Fisher problem. J Stat Assoc 76:162–168
38. Gibbons JD, Chakraborti S (2011) Nonparametric statistical inference, 5th edn. Taylor & Francis/CRC Press, Boca Raton
39. Govindarajulu Z (1968) Distribution-free confidence bounds for Pr{X < Y }. Ann Inst Stat Math 20:229–238
40. Happ M, Bathke AC, Brunner E (2018) Optimal sample size planning for the Wilcoxon-Mann-Whitney test. Stat Med 37:1–13. https://doi.org10.1002/sim.7983
41. Halperin M, Gilbert PR, Lachin JM (1987) Distribution-free confidence intervals for Pr(X 1 < X 2). Biometrics 43:71–80
42. Hamilton MA, Collings BJ (1991) Determining the appropriate sample size for nonparametric tests for location shift. Technometrics 33:327–337
43. Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29–36Google Scholar
44. Hilton JF, Mehta CR (1993) Power and sample size calculations for exact conditional tests with ordered categorical data. Biometrics 49:609–616
45. Hodges JL, Lehmann EL (1963) Estimation of location based on ranks. Ann Math Stat 34:598–611
46. Hothorn T, Hornik K, van de Wiel MA, Zeileis A (2008) Implementing a class of permutation tests: The coin Package. J Stat Softw 28:1–23Google Scholar
47. Hoyland A (1965) Robustness of the Hodges-Lehmann estimates for shift. Ann Math Stat 36:174–197
48. Hutmacher MM, French JL, Krishnaswami S, Menon S (2011) Estimating transformations for repeated measures modeling of continuous bounded outcome data. Stat Med 30:935–949
49. Janssen A (1999) Testing nonparametric statistical functionals with applications to rank tests. J Stat Plann Inference 81:71–93
50. Janssen A (2001) Erratum: Testing nonparametric statistical functionals with applications to rank tests [J. Statist. Plann. Inference 81 (1999) 71–93]. J Stat Plann Inference 92:297Google Scholar
51. Julious SA, Campbell MJ (1996) Letter to the Editor: sample sizes calculations for ordered categorical data. Stat Med 15:1065–1066Google Scholar
52. Kolassa JE (1995) A comparison of size and power calculations for the Wilcoxon statistic for ordered categorical data. Stat Med 14:1577–1581Google Scholar
53. Konietschke F, Pauly M (2012). A studentized permutation test for the nonparametric Behrens-Fisher problem in paired data. Electron J Stat 6:1358–1372
54. Kruskal WH (1952) A nonparametric test for the several sample problem. Ann Math Stat 23:525–540
55. Kruskal WH, Wallis WA (1952) The use of ranks in one-criterion variance analysis. J Am Stat Assoc 47:583–621
56. Lange K, Brunner E (2012) Sensitivity, specificity and ROC-curves in multiple reader diagnostic trials - a unified, nonparametric approach. Stat Methodol 9:490–500
57. Lehmann EL (1953) The power of rank tests. Ann Math Stat 24:23–43
58. Lehmann EL (1963) Nonparametric confidence intervals for a shift parameter. Ann Math Stat 34:1507–1512
59. Lehmann EL, D’Abrera HJM (2006) Nonparametrics: statistical methods based on ranks. Springer, Berlin, Heidelberg
60. Lesaffre E, Scheys I, Fröhlich J, Bluhmki E (1993) Calculation of power and sample size with bounded outcome scores. Stat Med 12:1063–1078Google Scholar
61. Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 18:50–60
62. Mee R-W (1990) Confidence intervals for probabilities and tolerance regions based on a generalization of the Mann-Whitney statistic. J Am Stat Assoc 85:793–800
63. Mehta CR, Patel NR, Senchaudhuri P (1988) Importance sampling for estimating exact probabilities in permutational inference. J Am Stat Assoc 83:999–1005
64. Moser BK, Stevens GR (1992) Homogeneity of variance in the two-sample means test. Am Stat 46:19–21Google Scholar
65. Neubert K, Brunner E (2007). A studentized permutation test for the nonparametric Behrens-Fisher problem. Comput Stat Data Anal 51:5192–5204
66. Neuhäuser M (2011) Nonparametric statistical tests: a computational approach. CRC Press, Boca Raton
67. Newcombe RG (2006a) Confidence intervals for an effect size measure based on the Mann-Whitney statistic. Part 1: general issues and tail-area-based methods. Stat Med 25:543–557
68. Newcombe RG (2006b) Confidence intervals for an effect size measure based on the Mann-Whitney statistic. Part 2: asymptotic methods and evaluation. Stat Med 25:559–573
69. Noether GE(1967) Elements of nonparametric statistics. Wiley, New York
70. Noether GE (1981) Comment on ‘Rank Transformations as a Bridge Between Parametric and Nonparametric Statistics’ (by W.J. Conover and R.L. Iman). Am Stat 35:129–130Google Scholar
71. Noether GE (1987) Sample size determination for some common nonparametric tests. J Am Stat Assoc 85:645–647
72. O’Brien RG, Castelloe JM (2006) Exploiting the link between the Wilcoxon-Mann-Whitney test and a simple odds statistic. In: Proceedings of the 31st Annual SAS Users Group International Conference, Paper 209–31. SAS Institute Inc., CaryGoogle Scholar
73. Pauly M, Asendorf T, Konietschke F (2016) Permutation-based inference for the AUC: a unified approach for continuous and discontinuous data. Biom J 58:1319–1337
74. Puntanen S, Styan GPH, Isotalo J (2011) Matrix tricks for linear statistical models. Springer, Heidelberg
75. Rabbee N, Coull BA, Mehta C (2003). Power and sample size for ordered categorical data. Stat Methods Med Res 12:73–84
76. Randles RH, Wolfe DA (1979) Introduction to the theory of nonparametric statistics. Wiley, New York. New edition: Krieger, 1991Google Scholar
77. Randles RH, Wolfe DA (1991) Introduction to the theory of nonparametric statistics. New edition: Krieger, 1991
78. Rosner B, Glynn RJ (2009) Power and sample size estimation for the Wilcoxon rank sum test with application to comparisons of C statistics from alternative prediction models. Biometrics 65:188–197
79. Satterthwaite FE (1946) An approximate distribution of estimates of variance components. Biom Bull 2:110–114Google Scholar
80. Seber GAF (2008) Matrix handbook for statisticians. Wiley, Hoboken
81. Sen PK (1967) A note on asymptotically distribution-free confidence intervals for Pr(X < Y ) based on two independent samples. Sankhya Ser A 29:95–102
82. Shieh G, Jan S-L, Randles RH (2006) On power and sample size determinations for thw Wilcoxon-Mann-Whitney test. J Nonparametr Stat 18:33–48
83. Smith HF (1936) The problem of comparing the results of two experiments with unequal errors. J Counc Sci Ind Res 9:211–212Google Scholar
84. Streitberg B, Röhmel J (1986) Exact distribution for permutation and rank tests: an introduction to some recently published algorithms. Stat Softw Newslett 12:10–17Google Scholar
85. Tang Y (2011) Size and power estimation for the Wilcoxon-Mann-Whitney test for ordered categorical data. Stat Med 30:3461–3470
86. Thompson GL (1991b) A note on the rank transform for interactions. Biomelrika 78:697–701
87. Vollandt R, Horn M (1997). Evaluation of Noether’s method of sample size determination for the Wilcoxon-Mann-Whitney test. Biom J 39:822–829
88. Wang H, Chen B, Chow S-C (2003) Sample size determination based on rank tests in clinical trials. J Biopharm Stat 13:735–751
89. Walter E (1962) Verteilungsunabhängige Schätzverfahren. Zeitschrift für Angewandte Mathematik und Mechanik 42:85–87
90. Welch BL (1937) The significance of the difference between two means when the population variances are unequal. Biometrika 29:350–362
91. Welch BL (1951) On the comparison of several mean values: an alternative approach. Biometrika 38:330–336
92. Whitehead J (1993) Sample size calculations for ordered categorical data. Stat Med 12:2257–2271Google Scholar
93. Wilcox RR (2003) Applying contemporary statistical techniques. Academic, San Diego
94. Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics 1:80–83
95. Zaremba SK (1962). A generalization of Wilcoxon’s tests. Monatshefte für Mathematik 66:359–70
96. Zhao YD, Rahardja D, Qu Y (2008). Sample size calculation for the Wilcoxon-Mann-Whitney test adjusting for ties. Stat Med 27:462–468
97. Zhou W (2008) Statistical inference for P(X < Y ). Stat Med 27:257–279

© Springer Nature Switzerland AG 2018

## Authors and Affiliations

• Edgar Brunner
• 1
• Arne C. Bathke
• 2
• Frank Konietschke
• 3
1. 1.Department of Medical StatisticsUniversity of G¨ottingen, University Medical CenterGöttingenGermany
2. 2.Department of MathematicsUniversity of SalzburgSalzburgAustria
3. 3.Institute of Biometry and Clinical EpidemiologyCharité – University Medical SchoolBerlinGermany