Abstract
Investigators routinely use unidimensional summaries for multidimensional data. In microarray data analysis, for example, the gene expression level is indeed a unidimensional summary of probe-level or SNP measurements. In this paper, we propose a mixture factor model for the low-level data, which enables us to examine the adequacy of a unidimensional summary while accommodating known or latent subgroups in the population. We also develop screening procedures based on the proposed model to identify potentially informative genes in biomedical studies. As shown in our empirical studies, the proposed methods are often more effective than existing methods because the new model goes beyond the conventional unidimensional summaries of gene expressions.
Similar content being viewed by others
References
Alexandrovich G (2014) A note on the article “Inference for multivariate normal mixtures” by J. Chen and X. Tan. J Multivar Anal 129:245–248
Asif N, Josse AR, Valentina G, Hannah C, Frederic R, Metairon S (2016) Biomarkers of browning of white adipose tissue and their regulation during exercise- and diet-induced weight loss. Am J Clin Nutr 104:557–565
Baek J (2011) Mixtures of common t-factor analyzers for clustering high-dimensional microarray data. Bioinformatics 27:1479–1486
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 57:289–300
Bolstad B, Irizarry R, Gautier L, Wu Z (2005) Bioinformatics and computational biology solutions using R and bioconductor. Springer, New York
Chassey B, Aublin-Gex A, Ruggieri A, Meyniel-Schicklin L, Pradezynski F et al (2013) The Interactomes of influenza virus NS1 and NS2 proteins identify new host factors and provide insights for ADAR1 playing a supportive role in virus replication. Plos Pathog 9:e1003440
Chen J, Tan X (2009) Inference for multivariate normal mixtures. J Multivar Anal 100:1367–1383
Cheng L, Lo LY, Tang NL, Wang D, Leung KS (2016) CrossNorm: a novel normalization strategy for microarray data in cancers. Sci Rep 6:18898
Choi U, Kang J, Hwang Y, Kim Y (2015) Oligoadenylate synthase-like (OASL) proteins: dual functions and associations with diseases. Exp Mol Med 47:e144
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38
Feng X, He X (2009) Inference on low-rank data matrix with applications to microarray data. Ann Appl Stat 3:1634–1654
Feng X, He X (2017) Robust low-rank data matrix approximations. Sci China Math 2:189–200
Georgiades S, Szatmari P, Boyle M, Hanna S, Duku E (2013) Investigating phenotypic heterogeneity in children with autism spectrum disorder: a factor mixture modeling approach. J Child Psychol Psychiatry Allied Discip 54:206–231
Ghahramani, Z., Hinton, G. E.: The EM algorithm for mixtures of factor analyzers. Technical report no. CRG-TR-96-1, University of Toronto
Goralski M, Sobieszczanska P, Obrepalska-Steplowska A, Swiercz A, Zmienko A, Figlerowicz M (2016) A gene expression microarray for Nicotiana benthamiana based on de novo transcriptome sequence assembly. Plant Methods 12:1–10
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218
Hu J, Wright F, Zou F (2006) Estimation of expression indexes for oligonucleotide arrays using singular value decomposition. J Am Stat Assoc 101:41–50
Hyejin C, Hui-Hsien C (2016) Thermodynamically optimal whole-genome tiling microarray design and validation. BMC Res Notes 9:1–12
Irizarry R, Hobbs B, Collin F, Beazer Y (2003) Exploration, normalization and summaries of high density oligonucleotide array probe level data. Biostatistics 4:249–264
Johnson RA, Wichern DW (2007) Applied multivariate statistical analysis, 6th edn. Pearson Education, New York
Kwissa M, Nakaya H, Onlamoon N, Wrammert J, Villinger F, Perng G et al (2014) Dengue virus infection induces expansion of CD14(\(+\))CD16(\(+\)) monocyte population that stimulates plasmablast differentiation. Cell Host Microbe 16:115–127
Lawley D, Maxwell A (1971) Factor analysis as a statistical method. Butterworth, London
Lubke GH, Muthen B (2005) Investigating population heterogeneity with factor mixture models. Psychol Methods 10:21–39
Li C, Wong W (2001) Model-based analysis of oligonucleotide arrays: expression index and outlier detection. Proc Natl Acad Sci 98:31–36
Lin TI, McLachlan GJ, Lee SX (2016) Extending mixtures of factor models using the restricted multivariate skew-normal distribution. J Multivar Anal 143:398–413
Mabry KM, Payne SZ, Anseth KS (2016) Microarray analyses to quantify advantages of 2D and 3D hydrogel culture systems in maintaining the native valvular interstitial cell phenotype. Biomaterials 74:31–41
Mantione KJ, Kream RM, Kuzelova H, Ptacek R, Raboch J, Samuel JM et al (2014) Comparing bioinformatic gene expression profiling methods: microarray and RNA-Seq. Med Sci Monit Basic Res 20:138–42
McLachlan GJ, Bean RW, Jones LT (2007) Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution. Comput Stat Data Anal 51:5327–5338
Miettunen J, Ahmed A (2015) Latent variable mixture modeling in psychiatric research—a review and application. Psychol Med 46:457–467
Murray PM, McNicholas PD, Browne RB (2013) Mixtures of common skew-t factor analyzers. Statistics 3:68–82
Murray PM, Browne RB, McNicholas PD (2014) Mixtures of skew-t factor analyzers. Comput Stat Data Anal 77:326–335
Parmigiani G, Garrett E, Irizarry R, Zeger S (2003) The analysis of gene expression data. Springer, New York
Sack M, Hlz K, Holik AK, Kretschy N, Somoza V, Stengele KP et al (2016) Express photolithographic DNA microarray synthesis with optimized chemistry and high-efficiency photolabile groups. J Nanobiotechnol 14:1–13
Smyth G (2004) Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3:1–28
Tortora C, Mcnicholas PD, Browne RP (2016) A mixture of generalized hyperbolic factor analyzers. Adv Data Anal Classif 10:423–440
Xie B, Pan W, Shen X (2010) Penalized mixtures of factor analyzers with application to clustering high-dimensional microarray data. Bioinformatics 26:501–508
Yung Y (1997) Finite mixtures in confirmatory factor-analysis models. Psychometrika 62:297–330
Acknowledgements
This study is partially supported by the Natural Science Foundation of China Grants 11631003, 11690012, 11771072 and 11371083. The authors thank three referees for their helpful comments that led to an improvement of the paper.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary material
This article contains supplementary material. In the supplement we provide the detailed proofs for the theorems in Appendix A, the estimation process in Appendix B, and additional results for real data analysis in Appendix C (pdf 331 KB)
Rights and permissions
About this article
Cite this article
Yuan, C., Zhu, W., He, X. et al. A mixture factor model with applications to microarray data. TEST 28, 60–76 (2019). https://doi.org/10.1007/s11749-018-0585-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-018-0585-3