Stratified Test Alleviates Batch Effects in Single-Cell Data

  • Shaoheng Liang
  • Qingnan Liang
  • Rui Chen
  • Ken ChenEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12099)


Analyzing single-cell sequencing data across batches is challenging. We find that the Van Elteren test, a stratified version of Wilcoxon rank-sum test, elegantly mitigates the problem. We also modified the common language effect size to supplement this test, further improving its utility. On both simulated and real patient data we show the ability of Van Elteren test to control for false positives and false negatives. The effect size also estimates the differences between cell types more accurately.


scRNA-seq analysis Differential expression analysis Batch effect Wilcoxon rank-sum test Van Elteren test 


  1. 1.
    Blyth, C.R.: On Simpson’s paradox and the sure-thing principle. J. Am. Stat. Assoc. 67(338), 364–366 (1972)MathSciNetzbMATHCrossRefGoogle Scholar
  2. 2.
    Everitt, B., Skrondal, A.: The Cambridge Dictionary of Statistics. BusinessPro collection, 4th edn. Cambridge University Press, Cambridge (2010)zbMATHCrossRefGoogle Scholar
  3. 3.
    Haghverdi, L., Lun, A.T., Morgan, M.D., Marioni, J.C.: Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36(5), 421 (2018)CrossRefGoogle Scholar
  4. 4.
    Hie, B., Bryson, B., Berger, B.: Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat. Biotechnol. 37(6), 685 (2019)CrossRefGoogle Scholar
  5. 5.
    Kerby, D.S.: The simple difference formula: an approach to teaching nonparametric correlation. Compr. Psychol. 3, 1–10 (2014). 11-TCrossRefGoogle Scholar
  6. 6.
    Kolesnikov, A.V., et al.: G-protein \(\beta \gamma \)-complex is crucial for efficient signal amplification in vision. J. Neurosci. 31(22), 8067–8077 (2011)CrossRefGoogle Scholar
  7. 7.
    Korsunsky, I., et al.: Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1–8 (2019)CrossRefGoogle Scholar
  8. 8.
    Liang, Q., et al.: Single-nuclei RNA-seq on human retinal tissue provides improved transcriptome profiling. Nat. Commun. 10(1), 1–12 (2019)CrossRefGoogle Scholar
  9. 9.
    Lopez, R., Regier, J., Cole, M.B., Jordan, M.I., Yosef, N.: Deep generative modeling for single-cell transcriptomics. Nat. Methods 15(12), 1053 (2018)CrossRefGoogle Scholar
  10. 10.
    Maddox, D.M., et al.: A mutation in Syne2 causes early retinal defects in photoreceptors, secondary neurons, and Müller glia. Invest. Ophthalmol. Vis. Sci. 56(6), 3776–3787 (2015)CrossRefGoogle Scholar
  11. 11.
    Mann, H.B., Whitney, D.R.: On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18, 50–60 (1947)MathSciNetzbMATHCrossRefGoogle Scholar
  12. 12.
    McGraw, K.O., Wong, S.: A common language effect size statistic. Psychol. Bull. 111(2), 361 (1992)CrossRefGoogle Scholar
  13. 13.
    Meyer, J.G., Garcia, T.Y., Schilling, B., Gibson, B.W., Lamba, D.A.: Proteome and secretome dynamics of human retinal pigment epithelium in response to reactive oxygen species. Sci. Rep. 9(1), 1–12 (2019)CrossRefGoogle Scholar
  14. 14.
    Nygaard, V., Rødland, E.A., Hovig, E.: Methods that remove batch effects while retaining group differences may lead to exaggerated confidence in downstream analyses. Biostatistics 17(1), 29–39 (2016)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Regev, A., et al.: Science forum: the human cell atlas. Elife 6, e27041 (2017)CrossRefGoogle Scholar
  16. 16.
    Stuart, T., et al.: Comprehensive integration of single-cell data. Cell 177, 1888–1902 (2019)CrossRefGoogle Scholar
  17. 17.
    Van Elteren, P.: On the combination of independent two-sample tests of Wilcoxon. Bull. Inst. Int. Staist. 37, 351–361 (1960)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Wilcoxon, F.: Individual comparisons by ranking methods. In: Kotz, S., Johnson, N.L. (eds.) Breakthroughs in Statistics. Springer Series in Statistics (Perspectives in Statistics), pp. 196–202. Springer, New York (1992). Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Rice UniversityHoustonUSA
  2. 2.The University of Texas MD Anderson Cancer CenterHoustonUSA
  3. 3.Baylor College of MedicineHoustonUSA

Personalised recommendations