CytoFA: Automated Gating of Mass Cytometry Data via Robust Skew Factor Analzyers

  • Sharon X. LeeEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11439)


Cytometry plays an important role in clinical diagnosis and monitoring of lymphomas, leukaemia, and AIDS. However, analysis of modern-day cytometric data is challenging. Besides its high-throughput nature and high dimensionality, these data typically exhibit complex characteristics such as multimodality, asymmetry, heavy-tailness and other non-normal characteristics. This paper presents cytoFA, a novel data mining approach capable of clustering and performing dimensionality reduction of high-dimensional cytometry data. Our approach is also robust against non-normal features including heterogeneity, skewness, and outliers (dead cells) that are typical in flow and mass cytometry data. Based on a statistical approach with well-studied properties, cytoFA adopts a mixtures of factor analyzers (MFA) to learn latent nonlinear low-dimensional representations of the data and to provide an automatic segmentation of the data into its comprising cell populations. We also introduce a double trimming approach to help identify atypical observations and to reduce computation time. The effectiveness of our approach is demonstrated on two large mass cytometry data, outperforming existing benchmark algorithms. We note that while the approach is motivated by cytometric data analysis, it is applicable and useful for modelling data from other fields.

Supplementary material

482290_1_En_40_MOESM1_ESM.pdf (51 kb)
Supplementary material 1 (pdf 51 KB)


  1. 1.
    Bendall, S.C., Simonds, E.F., Qiu, P., Amir, E.D., Krutzik, P.O., Finck, R.: Single-cell mass cytometry of differential immune and drug responses across a human hematopoietic continuum. Science 332, 687–696 (2011)CrossRefGoogle Scholar
  2. 2.
    Aghaeepour, N., et al.: Critical assessment of automated flow cytometry analysis techniques. Nat. Methods 10, 228–238 (2013)CrossRefGoogle Scholar
  3. 3.
    Saeys, Y., Van Gassen, S., Lambrecht, B.N.: Computational flow cytometry: helping to make sense of high-dimensional immunology data. Nat. Rev. Immunol. 16, 449–462 (2016)CrossRefGoogle Scholar
  4. 4.
    Weber, L.M., Robinson, M.D.: Comparison of clustering methods for high-dimensional single-cell flow and mass cytometry data. Cytom. A 89, 1084–1096 (2016)CrossRefGoogle Scholar
  5. 5.
    Pyne, S., et al.: Automated high-dimensional flow cytometric data analysis. Proc. Natl. Acad. Sci. USA 106, 8519–8524 (2009)CrossRefGoogle Scholar
  6. 6.
    Pyne, S., et al.: Joint modeling and registration of cell populations in cohorts of high-dimensional flow cytometric data. PloS One 9, e100334 (2014)CrossRefGoogle Scholar
  7. 7.
    Wang, K., Ng, S.K., McLachlan, G.J.: Multivariate skew \(t\) mixture models: applications to fluorescence-activated cell sorting data. In: Shi, H., Zhang, Y., Bottema, M.J., Lovell, B.C., Maeder, A.J. (eds.) Proceedings of Conference of Digital Image Computing: Techniques and Applications, Los Alamitos, California, pp. 526–531. IEEE (2009)Google Scholar
  8. 8.
    Frühwirth-Schnatter, S., Pyne, S.: Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-\(t\) distributions. Biostatistics 11, 317–336 (2010)CrossRefGoogle Scholar
  9. 9.
    Lee, S.X., McLachlan, G.J.: Model-based clustering and classification with non-normal mixture distributions. Stat. Methods Appl. 22, 427–454 (2013)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Lee, S.X., McLachlan, G.J.: Finite mixtures of canonical fundamental skew \(t\)-distributions: the unification of the restricted and unrestricted skew \(t\)-mixture models. Stat. Comput. 26, 573–589 (2016)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Lee, S.X., McLachlan, G.J., Pyne, S.: Modelling of inter-sample variation in flow cytometric data with the joint clustering and matching (JCM) procedure. Cytom. A 89, 30–43 (2016)CrossRefGoogle Scholar
  12. 12.
    Pyne, S., Lee, S., McLachlan, G.: Nature and man: the goal of bio-security in the course of rapid and inevitable human development. J. Indian Soc. Agric. Stat. 69, 117–125 (2015)MathSciNetGoogle Scholar
  13. 13.
    Rossin, E., Lin, T.I., Ho, H.J., Mentzer, S.J., Pyne, S.: A framework for analytical characterization of monoclonal antibodies based on reactivity profiles in different tissues. Bioinformatics 27, 2746–2753 (2011)CrossRefGoogle Scholar
  14. 14.
    Lee, S.X., McLachlan, G., Pyne, S.: Application of mixture models to large datasets. In: Pyne, S., Rao, B.L.S.P., Rao, S.B. (eds.) Big Data Analytics, pp. 57–74. Springer, New Delhi (2016). Scholar
  15. 15.
    Bouveyron, C., Brunet-Saumard, C.: Model-based clustering of high-dimensional data: a review. Comput. Stat. Data Anal. 71, 52–78 (2014)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Becher, B., et al.: High-dimensional analysis of the murine myeloid cell system. Nat. Immunol. 15, 1181–1189 (2014)CrossRefGoogle Scholar
  17. 17.
    Azzalini, A., Dalla Valle, A.: The multivariate skew-normal distribution. Biometrika 83, 715–726 (1996)MathSciNetCrossRefGoogle Scholar
  18. 18.
    McLachlan, G.J., Lee, S.X.: Comment on “on nomenclature for, and the relative merits of, two formulations of skew distributions” by A. Azzalini, R. Browne, M. Genton, and P. McNicholas. Stat. Probab. Lett. 116, 1–5 (2016)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Lee, S.X., McLachlan, G.J.: On mixtures of skew-normal and skew \(t\)-distributions. Adv. Data Anal. Classif. 7, 241–266 (2013)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Ghahramani, Z., Beal, M.: Variational inference for Bayesian mixture of factor analysers. In: Solla, S., Leen, T., Muller, K.R. (eds.) Advances in Neural Information Processing Systems, pp. 449–455. MIT Press, Cambridge (2000)Google Scholar
  21. 21.
    McLachlan, G.J., Peel, D.: Mixtures of factor analyzers. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 599–606. Morgan Kaufmann, San Francisco (2000)Google Scholar
  22. 22.
    Neykov, N., Filzmoser, P., Dimova, R., Neytchev, P.: Robust fitting of mixtures using the trimmed likelihood estimator. Comput. Stat. Data Anal. 52, 299–308 (2007)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39, 1–38 (1977)MathSciNetzbMATHGoogle Scholar
  24. 24.
    Lin, T.I., McLachlan, G.J., Lee, S.X.: Extending mixtures of factor models using the restricted multivariate skew-normal distribution. J. Multivar. Anal. 143, 398–413 (2016)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Lee, S.X.: Mining high-dimensional CyTOF data: concurrent gating, outlier removal, and dimension reduction. In: Huang, Z., Xiao, X., Cao, X. (eds.) ADC 2017. LNCS, vol. 10538, pp. 178–189. Springer, Cham (2017). Scholar
  26. 26.
    Levine, J.H., et al.: Data driven phenotypic dissection of aml reveals progenitor-like cells that correlate with prognosis. Cell 162, 184–197 (2015)CrossRefGoogle Scholar
  27. 27.
    Weber, L.M., Robinson, M.D.: Comparison of clustering methods for high-dimensional single-cell flow and mass cytometry data. Cytom. A 89A, 1084–1096 (2016)CrossRefGoogle Scholar
  28. 28.
    Van Gassen, S., Callebaut, B., Van Helden, M.J., Lambrecht, B.N., Demeester, P., Dhaene, T.: FlowSOM: using self-organizing maps for visualization and interpretation of cytometry data. Cytom. A 87A, 636–645 (2015)CrossRefGoogle Scholar
  29. 29.
    Sorensen, T., Baumgart, S., Durek, P., Grutzkau, A., Haaupl, T.: immunoClust - an automated analysis pipeline for the identification of immunophenotypic signatures in high-dimensional cytometric datasets. Cytom. A 87A, 603–615 (2015)CrossRefGoogle Scholar
  30. 30.
    Mosmann, T.R., Naim, I., Rebhahn, J., Datta, S., Cavenaugh, J.S., Weaver, J.M.: SWIFT - scalable clustering for automated identification of rare cell populations in large, high-dimensional flow cytometry datasets. Cytom. A 85A, 422–433 (2014)CrossRefGoogle Scholar
  31. 31.
    Aghaeepour, N., Nikoloc, R., Hoos, H.H., Brinkman, R.R.: Rapid cell population identification in flow cytometry data. Cytom. A 79, 6–13 (2011)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.School of Mathematics and PhysicsUniversity of QueenslandBrisbaneAustralia

Personalised recommendations