Skip to main content
Log in

A mixture factor model with applications to microarray data

  • Original Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

Investigators routinely use unidimensional summaries for multidimensional data. In microarray data analysis, for example, the gene expression level is indeed a unidimensional summary of probe-level or SNP measurements. In this paper, we propose a mixture factor model for the low-level data, which enables us to examine the adequacy of a unidimensional summary while accommodating known or latent subgroups in the population. We also develop screening procedures based on the proposed model to identify potentially informative genes in biomedical studies. As shown in our empirical studies, the proposed methods are often more effective than existing methods because the new model goes beyond the conventional unidimensional summaries of gene expressions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Alexandrovich G (2014) A note on the article “Inference for multivariate normal mixtures” by J. Chen and X. Tan. J Multivar Anal 129:245–248

    Article  MathSciNet  MATH  Google Scholar 

  • Asif N, Josse AR, Valentina G, Hannah C, Frederic R, Metairon S (2016) Biomarkers of browning of white adipose tissue and their regulation during exercise- and diet-induced weight loss. Am J Clin Nutr 104:557–565

    Article  Google Scholar 

  • Baek J (2011) Mixtures of common t-factor analyzers for clustering high-dimensional microarray data. Bioinformatics 27:1479–1486

    Article  Google Scholar 

  • Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 57:289–300

    MathSciNet  MATH  Google Scholar 

  • Bolstad B, Irizarry R, Gautier L, Wu Z (2005) Bioinformatics and computational biology solutions using R and bioconductor. Springer, New York

    Google Scholar 

  • Chassey B, Aublin-Gex A, Ruggieri A, Meyniel-Schicklin L, Pradezynski F et al (2013) The Interactomes of influenza virus NS1 and NS2 proteins identify new host factors and provide insights for ADAR1 playing a supportive role in virus replication. Plos Pathog 9:e1003440

    Article  Google Scholar 

  • Chen J, Tan X (2009) Inference for multivariate normal mixtures. J Multivar Anal 100:1367–1383

    Article  MathSciNet  MATH  Google Scholar 

  • Cheng L, Lo LY, Tang NL, Wang D, Leung KS (2016) CrossNorm: a novel normalization strategy for microarray data in cancers. Sci Rep 6:18898

    Article  Google Scholar 

  • Choi U, Kang J, Hwang Y, Kim Y (2015) Oligoadenylate synthase-like (OASL) proteins: dual functions and associations with diseases. Exp Mol Med 47:e144

    Article  Google Scholar 

  • Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38

    MathSciNet  MATH  Google Scholar 

  • Feng X, He X (2009) Inference on low-rank data matrix with applications to microarray data. Ann Appl Stat 3:1634–1654

    Article  MathSciNet  MATH  Google Scholar 

  • Feng X, He X (2017) Robust low-rank data matrix approximations. Sci China Math 2:189–200

    Article  MathSciNet  MATH  Google Scholar 

  • Georgiades S, Szatmari P, Boyle M, Hanna S, Duku E (2013) Investigating phenotypic heterogeneity in children with autism spectrum disorder: a factor mixture modeling approach. J Child Psychol Psychiatry Allied Discip 54:206–231

    Article  Google Scholar 

  • Ghahramani, Z., Hinton, G. E.: The EM algorithm for mixtures of factor analyzers. Technical report no. CRG-TR-96-1, University of Toronto

  • Goralski M, Sobieszczanska P, Obrepalska-Steplowska A, Swiercz A, Zmienko A, Figlerowicz M (2016) A gene expression microarray for Nicotiana benthamiana based on de novo transcriptome sequence assembly. Plant Methods 12:1–10

    Article  Google Scholar 

  • Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218

    Article  MATH  Google Scholar 

  • Hu J, Wright F, Zou F (2006) Estimation of expression indexes for oligonucleotide arrays using singular value decomposition. J Am Stat Assoc 101:41–50

    Article  MathSciNet  MATH  Google Scholar 

  • Hyejin C, Hui-Hsien C (2016) Thermodynamically optimal whole-genome tiling microarray design and validation. BMC Res Notes 9:1–12

    Article  Google Scholar 

  • Irizarry R, Hobbs B, Collin F, Beazer Y (2003) Exploration, normalization and summaries of high density oligonucleotide array probe level data. Biostatistics 4:249–264

    Article  MATH  Google Scholar 

  • Johnson RA, Wichern DW (2007) Applied multivariate statistical analysis, 6th edn. Pearson Education, New York

    MATH  Google Scholar 

  • Kwissa M, Nakaya H, Onlamoon N, Wrammert J, Villinger F, Perng G et al (2014) Dengue virus infection induces expansion of CD14(\(+\))CD16(\(+\)) monocyte population that stimulates plasmablast differentiation. Cell Host Microbe 16:115–127

    Article  Google Scholar 

  • Lawley D, Maxwell A (1971) Factor analysis as a statistical method. Butterworth, London

    MATH  Google Scholar 

  • Lubke GH, Muthen B (2005) Investigating population heterogeneity with factor mixture models. Psychol Methods 10:21–39

    Article  Google Scholar 

  • Li C, Wong W (2001) Model-based analysis of oligonucleotide arrays: expression index and outlier detection. Proc Natl Acad Sci 98:31–36

    Article  MATH  Google Scholar 

  • Lin TI, McLachlan GJ, Lee SX (2016) Extending mixtures of factor models using the restricted multivariate skew-normal distribution. J Multivar Anal 143:398–413

    Article  MathSciNet  MATH  Google Scholar 

  • Mabry KM, Payne SZ, Anseth KS (2016) Microarray analyses to quantify advantages of 2D and 3D hydrogel culture systems in maintaining the native valvular interstitial cell phenotype. Biomaterials 74:31–41

    Article  Google Scholar 

  • Mantione KJ, Kream RM, Kuzelova H, Ptacek R, Raboch J, Samuel JM et al (2014) Comparing bioinformatic gene expression profiling methods: microarray and RNA-Seq. Med Sci Monit Basic Res 20:138–42

    Article  Google Scholar 

  • McLachlan GJ, Bean RW, Jones LT (2007) Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution. Comput Stat Data Anal 51:5327–5338

    Article  MathSciNet  MATH  Google Scholar 

  • Miettunen J, Ahmed A (2015) Latent variable mixture modeling in psychiatric research—a review and application. Psychol Med 46:457–467

    Article  Google Scholar 

  • Murray PM, McNicholas PD, Browne RB (2013) Mixtures of common skew-t factor analyzers. Statistics 3:68–82

    Article  MATH  Google Scholar 

  • Murray PM, Browne RB, McNicholas PD (2014) Mixtures of skew-t factor analyzers. Comput Stat Data Anal 77:326–335

    Article  MathSciNet  MATH  Google Scholar 

  • Parmigiani G, Garrett E, Irizarry R, Zeger S (2003) The analysis of gene expression data. Springer, New York

    Book  MATH  Google Scholar 

  • Sack M, Hlz K, Holik AK, Kretschy N, Somoza V, Stengele KP et al (2016) Express photolithographic DNA microarray synthesis with optimized chemistry and high-efficiency photolabile groups. J Nanobiotechnol 14:1–13

    Article  Google Scholar 

  • Smyth G (2004) Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3:1–28

    Article  MathSciNet  MATH  Google Scholar 

  • Tortora C, Mcnicholas PD, Browne RP (2016) A mixture of generalized hyperbolic factor analyzers. Adv Data Anal Classif 10:423–440

    Article  MathSciNet  Google Scholar 

  • Xie B, Pan W, Shen X (2010) Penalized mixtures of factor analyzers with application to clustering high-dimensional microarray data. Bioinformatics 26:501–508

    Article  Google Scholar 

  • Yung Y (1997) Finite mixtures in confirmatory factor-analysis models. Psychometrika 62:297–330

    Article  MATH  Google Scholar 

Download references

Acknowledgements

This study is partially supported by the Natural Science Foundation of China Grants 11631003, 11690012, 11771072 and 11371083. The authors thank three referees for their helpful comments that led to an improvement of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianhua Guo.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material

This article contains supplementary material. In the supplement we provide the detailed proofs for the theorems in Appendix A, the estimation process in Appendix B, and additional results for real data analysis in Appendix C (pdf 331 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yuan, C., Zhu, W., He, X. et al. A mixture factor model with applications to microarray data. TEST 28, 60–76 (2019). https://doi.org/10.1007/s11749-018-0585-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-018-0585-3

Keywords

Mathematics Subject Classification

Navigation