A mixture factor model with applications to microarray data

Yuan, Chaofeng; Zhu, Wensheng; He, Xuming; Guo, Jianhua

doi:10.1007/s11749-018-0585-3

A mixture factor model with applications to microarray data

Original Paper
Published: 11 May 2018

Volume 28, pages 60–76, (2019)
Cite this article

TEST Aims and scope Submit manuscript

Chaofeng Yuan¹,
Wensheng Zhu¹,
Xuming He² &
…
Jianhua Guo¹

394 Accesses
1 Citation
Explore all metrics

Abstract

Investigators routinely use unidimensional summaries for multidimensional data. In microarray data analysis, for example, the gene expression level is indeed a unidimensional summary of probe-level or SNP measurements. In this paper, we propose a mixture factor model for the low-level data, which enables us to examine the adequacy of a unidimensional summary while accommodating known or latent subgroups in the population. We also develop screening procedures based on the proposed model to identify potentially informative genes in biomedical studies. As shown in our empirical studies, the proposed methods are often more effective than existing methods because the new model goes beyond the conventional unidimensional summaries of gene expressions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Factor Analysis with Mixture Modeling to Evaluate Coherent Patterns in Microarray Data

High-dimensional variable selection with the plaid mixture model for clustering

Article 17 May 2018

Clustering non-linear interactions in factor analysis

Article 17 September 2020

References

Alexandrovich G (2014) A note on the article “Inference for multivariate normal mixtures” by J. Chen and X. Tan. J Multivar Anal 129:245–248
Article MathSciNet MATH Google Scholar
Asif N, Josse AR, Valentina G, Hannah C, Frederic R, Metairon S (2016) Biomarkers of browning of white adipose tissue and their regulation during exercise- and diet-induced weight loss. Am J Clin Nutr 104:557–565
Article Google Scholar
Baek J (2011) Mixtures of common t-factor analyzers for clustering high-dimensional microarray data. Bioinformatics 27:1479–1486
Article Google Scholar
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 57:289–300
MathSciNet MATH Google Scholar
Bolstad B, Irizarry R, Gautier L, Wu Z (2005) Bioinformatics and computational biology solutions using R and bioconductor. Springer, New York
Google Scholar
Chassey B, Aublin-Gex A, Ruggieri A, Meyniel-Schicklin L, Pradezynski F et al (2013) The Interactomes of influenza virus NS1 and NS2 proteins identify new host factors and provide insights for ADAR1 playing a supportive role in virus replication. Plos Pathog 9:e1003440
Article Google Scholar
Chen J, Tan X (2009) Inference for multivariate normal mixtures. J Multivar Anal 100:1367–1383
Article MathSciNet MATH Google Scholar
Cheng L, Lo LY, Tang NL, Wang D, Leung KS (2016) CrossNorm: a novel normalization strategy for microarray data in cancers. Sci Rep 6:18898
Article Google Scholar
Choi U, Kang J, Hwang Y, Kim Y (2015) Oligoadenylate synthase-like (OASL) proteins: dual functions and associations with diseases. Exp Mol Med 47:e144
Article Google Scholar
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38
MathSciNet MATH Google Scholar
Feng X, He X (2009) Inference on low-rank data matrix with applications to microarray data. Ann Appl Stat 3:1634–1654
Article MathSciNet MATH Google Scholar
Feng X, He X (2017) Robust low-rank data matrix approximations. Sci China Math 2:189–200
Article MathSciNet MATH Google Scholar
Georgiades S, Szatmari P, Boyle M, Hanna S, Duku E (2013) Investigating phenotypic heterogeneity in children with autism spectrum disorder: a factor mixture modeling approach. J Child Psychol Psychiatry Allied Discip 54:206–231
Article Google Scholar
Ghahramani, Z., Hinton, G. E.: The EM algorithm for mixtures of factor analyzers. Technical report no. CRG-TR-96-1, University of Toronto
Goralski M, Sobieszczanska P, Obrepalska-Steplowska A, Swiercz A, Zmienko A, Figlerowicz M (2016) A gene expression microarray for Nicotiana benthamiana based on de novo transcriptome sequence assembly. Plant Methods 12:1–10
Article Google Scholar
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218
Article MATH Google Scholar
Hu J, Wright F, Zou F (2006) Estimation of expression indexes for oligonucleotide arrays using singular value decomposition. J Am Stat Assoc 101:41–50
Article MathSciNet MATH Google Scholar
Hyejin C, Hui-Hsien C (2016) Thermodynamically optimal whole-genome tiling microarray design and validation. BMC Res Notes 9:1–12
Article Google Scholar
Irizarry R, Hobbs B, Collin F, Beazer Y (2003) Exploration, normalization and summaries of high density oligonucleotide array probe level data. Biostatistics 4:249–264
Article MATH Google Scholar
Johnson RA, Wichern DW (2007) Applied multivariate statistical analysis, 6th edn. Pearson Education, New York
MATH Google Scholar
Kwissa M, Nakaya H, Onlamoon N, Wrammert J, Villinger F, Perng G et al (2014) Dengue virus infection induces expansion of CD14(\(+\))CD16(\(+\)) monocyte population that stimulates plasmablast differentiation. Cell Host Microbe 16:115–127
Article Google Scholar
Lawley D, Maxwell A (1971) Factor analysis as a statistical method. Butterworth, London
MATH Google Scholar
Lubke GH, Muthen B (2005) Investigating population heterogeneity with factor mixture models. Psychol Methods 10:21–39
Article Google Scholar
Li C, Wong W (2001) Model-based analysis of oligonucleotide arrays: expression index and outlier detection. Proc Natl Acad Sci 98:31–36
Article MATH Google Scholar
Lin TI, McLachlan GJ, Lee SX (2016) Extending mixtures of factor models using the restricted multivariate skew-normal distribution. J Multivar Anal 143:398–413
Article MathSciNet MATH Google Scholar
Mabry KM, Payne SZ, Anseth KS (2016) Microarray analyses to quantify advantages of 2D and 3D hydrogel culture systems in maintaining the native valvular interstitial cell phenotype. Biomaterials 74:31–41
Article Google Scholar
Mantione KJ, Kream RM, Kuzelova H, Ptacek R, Raboch J, Samuel JM et al (2014) Comparing bioinformatic gene expression profiling methods: microarray and RNA-Seq. Med Sci Monit Basic Res 20:138–42
Article Google Scholar
McLachlan GJ, Bean RW, Jones LT (2007) Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution. Comput Stat Data Anal 51:5327–5338
Article MathSciNet MATH Google Scholar
Miettunen J, Ahmed A (2015) Latent variable mixture modeling in psychiatric research—a review and application. Psychol Med 46:457–467
Article Google Scholar
Murray PM, McNicholas PD, Browne RB (2013) Mixtures of common skew-t factor analyzers. Statistics 3:68–82
Article MATH Google Scholar
Murray PM, Browne RB, McNicholas PD (2014) Mixtures of skew-t factor analyzers. Comput Stat Data Anal 77:326–335
Article MathSciNet MATH Google Scholar
Parmigiani G, Garrett E, Irizarry R, Zeger S (2003) The analysis of gene expression data. Springer, New York
Book MATH Google Scholar
Sack M, Hlz K, Holik AK, Kretschy N, Somoza V, Stengele KP et al (2016) Express photolithographic DNA microarray synthesis with optimized chemistry and high-efficiency photolabile groups. J Nanobiotechnol 14:1–13
Article Google Scholar
Smyth G (2004) Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3:1–28
Article MathSciNet MATH Google Scholar
Tortora C, Mcnicholas PD, Browne RP (2016) A mixture of generalized hyperbolic factor analyzers. Adv Data Anal Classif 10:423–440
Article MathSciNet Google Scholar
Xie B, Pan W, Shen X (2010) Penalized mixtures of factor analyzers with application to clustering high-dimensional microarray data. Bioinformatics 26:501–508
Article Google Scholar
Yung Y (1997) Finite mixtures in confirmatory factor-analysis models. Psychometrika 62:297–330
Article MATH Google Scholar

Download references

Acknowledgements

This study is partially supported by the Natural Science Foundation of China Grants 11631003, 11690012, 11771072 and 11371083. The authors thank three referees for their helpful comments that led to an improvement of the paper.

Author information

Authors and Affiliations

Key Laboratory of Applied Statistics of MOE and School of Mathematics and Statistics, Northeast Normal University, Changchun, 130024, China
Chaofeng Yuan, Wensheng Zhu & Jianhua Guo
Department of Statistics, University of Michigan, Ann Arbor, MI, 48109, USA
Xuming He

Authors

Chaofeng Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Wensheng Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Xuming He
View author publications
You can also search for this author in PubMed Google Scholar
Jianhua Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianhua Guo.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material

This article contains supplementary material. In the supplement we provide the detailed proofs for the theorems in Appendix A, the estimation process in Appendix B, and additional results for real data analysis in Appendix C (pdf 331 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuan, C., Zhu, W., He, X. et al. A mixture factor model with applications to microarray data. TEST 28, 60–76 (2019). https://doi.org/10.1007/s11749-018-0585-3

Download citation

Received: 30 August 2017
Accepted: 26 April 2018
Published: 11 May 2018
Issue Date: 12 March 2019
DOI: https://doi.org/10.1007/s11749-018-0585-3

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A mixture factor model with applications to microarray data

Abstract

Access this article

Similar content being viewed by others

Factor Analysis with Mixture Modeling to Evaluate Coherent Patterns in Microarray Data

High-dimensional variable selection with the plaid mixture model for clustering

Clustering non-linear interactions in factor analysis

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

A mixture factor model with applications to microarray data

Abstract

Access this article

Similar content being viewed by others

Factor Analysis with Mixture Modeling to Evaluate Coherent Patterns in Microarray Data

High-dimensional variable selection with the plaid mixture model for clustering

Clustering non-linear interactions in factor analysis

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation