Advertisement

TEST

pp 1–26 | Cite as

Testing equality of a large number of densities under mixing conditions

  • Marta Cousido-RochaEmail author
  • Jacobo de Uña-Álvarez
  • Jeffrey D. Hart
Original Paper
  • 24 Downloads

Abstract

In certain settings, such as microarray data, the sampling information is formed by a large number of possibly dependent small data sets. In special applications, for example in order to perform clustering, the researcher aims to verify whether all data sets have a common distribution. For this reason we propose a formal test for the null hypothesis that all data sets come from a single distribution. The asymptotic setting is that in which the number of small data sets goes to infinity, while the sample size remains fixed. The asymptotic null distribution of the proposed test is derived under mixing conditions on the sequence of small data sets, and the power properties of our test under two reasonable fixed alternatives are investigated. A simulation study is conducted, showing that the test respects the nominal level, and that it has a power which tends to 1 when the number of data sets tends to infinity. An illustration involving microarray data is provided.

Keywords

Dependent data Kernel density estimation k-Sample problem Smooth tests U-statistics 

Mathematics Subject Classification

62G10 

Notes

Acknowledgements

This work has received financial support of the Call 2015 Grants for Ph.D. contracts for training of doctors of the Ministry of Economy and Competitiveness, cofinanced by the European Social Fund (Ref. BES-2015-074958). We acknowledge support from MTM2014-55966-P project, Ministry of Economy and Competitiveness, and MTM2017-89422-P project, Ministry of Economy, Industry and Competitiveness, State Research Agency, and Regional Development Fund, UE. We also acknowledge the financial support provided by the SiDOR research group through the grant Competitive Reference Group, 2016–2019 (ED431C 2016/040), funded by the “Consellería de Cultura, Educación e Ordenación Universitaria. Xunta de Galicia.” To finish, the first author would like to thank the University of Vigo, and its Escola Internacional de Doutoramento (EIDO) by the financial support provided through mobility doctorate grants. The authors also thank Professors Raymond J. Carroll and Robert Chapkin for allowing use of their data.

Supplementary material

11749_2018_625_MOESM1_ESM.pdf (399 kb)
Supplementary Materials: Supplementary Material includes formal definitions of mixing dependence, stationarity and regularity conditions needed for the technical results, a remark about Theorem 5, the proof of Theorem 6, an additional real data analysis, and additional simulation results. (pdf 394KB)

References

  1. Bücher A, Kojadinovic I (2016a) A dependent multiplier bootstrap for the sequential empirical copula process under strong mixing. Bernoulli 22:927–968MathSciNetCrossRefzbMATHGoogle Scholar
  2. Bücher A, Kojadinovic I (2016b) Dependent multiplier bootstrap for non-degenerate \(U\)-statistics under mixing conditions with applications. J Stat Plan Inference 170:83–105MathSciNetCrossRefzbMATHGoogle Scholar
  3. Bühlmann P (1993) The blockwise bootstrap in time series and empirical processes (Ph.D. thesis), ETH Zürich, Diss. ETH No. 10354Google Scholar
  4. Cousido-Rocha M, de Uña-Álvarez J, Hart J (2018) Equalden.HD: testing the equality of a high dimensional set of densities. R package version 1.0. CRAN package repository: https://cran.r-project.org/web/packages/Equalden.HD/index.html
  5. Dehling H, Wendler M (2010) Central limit theorem and the bootstrap for \(U\)-statistics of strongly mixing data. J Multivar Anal 101:126–137MathSciNetCrossRefzbMATHGoogle Scholar
  6. Dehling H, Fried R, Garcia I, Wendler M (2015) Change-point detection under dependence based on two-sample \(U\)-statistics. Asymptotic laws and method in stochastics, a volume in Honour of Miklos Csrg, pp 195–220Google Scholar
  7. Dey-Rao R, Sinha AA (2017) Genome-wide gene expression dataset used to identify potential therapeutic targets in androgenetic alopecia. Data Brief 13:85–87CrossRefGoogle Scholar
  8. Doukhan P (1995) Mixing: properties and examples. Springer, New YorkzbMATHGoogle Scholar
  9. Fan J, Yao Q (2003) Non linear time series: nonparametric and parametric methods. Springer, New YorkCrossRefzbMATHGoogle Scholar
  10. Hahn M (2006) Proceedings of the SMBE Tri-National Young Investigators’ Workshop 2005. Accurate inference and estimation in population genomics. Mol Biol Evol 23:911–8CrossRefGoogle Scholar
  11. Hedenfalk I, Duggan D, Chen Y, Radmacher M, Bittner M, Simon R, Meltzer P, Gusterson B, Esteller M, Kallioniemi O, Wilfond B, Borg A, Trent J, Raffeld M, Yakhini Z, BenDor A, Dougherty E, Kononen J, Bubendorf L, Fehrle W, Pittaluga S, Gruvberger G, Loman N, Johannsson O, Olsson H, Sauter G (2001) Gene-expression profiles in hereditary breast cancer. N Engl J Med 344(8):539–548CrossRefGoogle Scholar
  12. Koren A, Tirosh I, Barkai N (2007) Autocorrelation analysis reveals widespread spatial biases in microarray experiments. BMC Genomics 8:164CrossRefGoogle Scholar
  13. Künsch HR (1989) The jackknife and the bootstrap for general stationary observations. Ann Stat 17(3):1217–1241MathSciNetCrossRefzbMATHGoogle Scholar
  14. Liu RY, Singh K (1992) Moving blocks jackknife and bootstrap capture weak dependence. In: Lepage R, Billard L (eds) Exploring the limits of bootstrap. Wiley, New YorkGoogle Scholar
  15. Marmer V (2016) Lecture notes on econometric theory II: Lecture 7, adapted from Peter Phillips’ lecture notes on stationarity and NSTS, 1995, and H. White, 1999, asymptotic theory for econometricians, Academic Press. UBC Vancouver School of Economics, Econ627. http://faculty.arts.ubc.ca/vmarmer/econ627/627_07_2.pdf
  16. Neumann MH, Paparoditis E (2000) On bootstrapping \(L_2\)-type statistics in density testing. Stat Probab Lett 50:137–147CrossRefzbMATHGoogle Scholar
  17. Priestley MB (1981) Spectral analysis and time series. Academic Press, New YorkzbMATHGoogle Scholar
  18. Politis DN (2002) Adaptive bandwidth choice. https://pdfs.semanticscholar.org/c8d5/4df33343c6550HrB85f867e82a1861e9d510dcd.pdfHrB. Accessed 13 Feb 2017
  19. Politis DN, Romano JP (1994) Bias-corrected nonparametric spectral estimation II. Technical Report #94-5Google Scholar
  20. Quessy JF, Éthier F (2012) Cramér–von Mises and characteristic function tests for the two and \(k\)-sample problems with dependent data. Comput Stat Data Anal 56:2097–2111CrossRefzbMATHGoogle Scholar
  21. van der Vaart AW, Wellner JA (2000) Weak convergence and empirical processes, 2nd edn. Springer, New YorkzbMATHGoogle Scholar
  22. Zhan D, Hart J (2014) Testing equality of a large number of densities. Biometrika 101:449–464MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Sociedad de Estadística e Investigación Operativa 2019

Authors and Affiliations

  1. 1.Department of Statistics and Operations Research and SiDOR Research Group, Faculty of EconomicsUniversity of VigoVigoSpain
  2. 2.Centro de Investigaciones Biomédicas (CINBIO)University of VigoVigoSpain
  3. 3.Department of StatisticsTexas A&M UniversityCollege StationUSA

Personalised recommendations