Determining the number of factors for non-negative matrix and its application in source apportionment of air pollution in Singapore

  • Mei Yan
  • Xiaojie Yang
  • Weiqiang Hang
  • Yingcun XiaEmail author
Original Paper


The non-negative matrix factorization has been used in many disciplines of research, where the number of factors plays a crucial role. However, a fully data-driven method for determining the number is yet not available in the literature. Based on the fact that the most appropriate number of factors should generate the best prediction, in this paper we propose a selection method using a two-step delete-one-out approach, called twice cross-validation. This method is easy to implement and is fully data-driven. It also works when constraints are imposed on the factorization including the sparsity. Intensive simulations and real data analyses suggest that the proposed method performs well in most cases and can select the number of factors correctly when the number of factors is much less than the dimension of variables and the sample size is reasonably large. As an important application, the proposed method is used for source apportionment of air pollution in Singapore, and provides physically reasonable source profiles.


Air-pollution Cross-validation Factor model Non-negative matrix Source apportionment 



We are most grateful to the AE and two referees for their valuable comments and constructive suggestions, which have led to a substantial improvement of this paper. YC Xia’s research is partially supported by MOE Tier 1 Grant: R-155-000-193-114, and MOE Grant of Singapore: MOE2014-T2-1-072, and National Natural Science Foundation of China, 11771066.


  1. Al-Thani H, Koc M, Isaifan RJ (2018) Investigations on deposited dust fallout in Urban Doha: characterization, source apportionment and mitigation. Environ Ecol Res 6:1493–506Google Scholar
  2. Bai J, Ng S (2002) Determining the number of factors in approximate factor models. Econometrica 70:191–221CrossRefGoogle Scholar
  3. Bartoletti S, Loperfido N (2010) Modelling air pollution data by the skew-normal distribution. Stoch Environ Res Risk Assess 24:513–517CrossRefGoogle Scholar
  4. Bayraktar H, Turalioǧlu FS, Tuncel G (2010) Average mass concentrations of TSP, PM10 and PM2. 5 in Erzurum urban atmosphere, Turkey. Stoch Environ Res Risk Assess 24:57–65CrossRefGoogle Scholar
  5. Belis CA et al (2014) European guide on with receptor models air pollution. JRC reference report, European CommissionGoogle Scholar
  6. Beuck H, Quass U, Klemm O, Kuhlbusch TAJ (2011) Assessment of sea salt and mineral dust contributions to PM10 in NW Germany usingtracer models and positive matrix factorization. Atmos Environ 45:5813–5821CrossRefGoogle Scholar
  7. Bro R, Kjeldahl K, Smilde AK, Kiers HAL (2008) Cross-validation of component models: a critical look at current methods. Anal Bioanal Chem 390:1241–1251CrossRefGoogle Scholar
  8. Brown S, Hafner H (2005) Multivariate receptor modeling workbook. USEPA, Research Triangle ParkGoogle Scholar
  9. Brunet J, Tamayo P, Golub T, Mesirov J (2004) Metagenes and molecular pattern discovery using matrix factorization. Proc Natl Acad Sci 101:4164–4169CrossRefGoogle Scholar
  10. Buzcu B, Fraser MP, Kulkarni P, Chellam S (2003) Source identification and apportionment of fine particulate matter in Houston, TX, using positive matrix factorization. Environ Eng Sci 20:533–545CrossRefGoogle Scholar
  11. Cabada JC, Pandis SN, Robinson AL (2002) Sources of atmospheric carbonaceous particulate matter in Pittsburgh, Pennsylvania. J Air Waste Manag Assoc 52:732–741CrossRefGoogle Scholar
  12. Chan YC, Hawas O, Hawker D, Vowles P, Cohen DD, Stelcer E et al (2011) Using multiple type composition data and wind data in PMF analysis to apportion and locate sources of air pollutants. Atmos Environ 2:439–449CrossRefGoogle Scholar
  13. Fassò A (2013) Statistical assessment of air quality interventions. Stoch Environ Res Risk Assess 27:1651–1660CrossRefGoogle Scholar
  14. Hien P, Bac V, Thinh N (2004) PMF receptor modelling of fine and coarse PM 10 in air masses governing monsoon conditions in Hanoi, northern Vietnam. Atmos Environ 38:189–201CrossRefGoogle Scholar
  15. Ho WY, Tseng KH, Liou ML, Chan CC, Wang CH (2018) Application of positive matrix factorization in the identification of the sources of PM2.5 in Taipei City. Int J Environ Res Public Health 15:1305CrossRefGoogle Scholar
  16. Hopke P (2000) A guide to positive matrix factorization. In: Workshop on UNMIX and PMF as applied to PM2, vol 5, p 600Google Scholar
  17. Kim E, Hopke P (2004) Improving source identification of fine particles in a rural northeastern U.S. area utilizing temperature-resolved carbon fractions. J Geophys Res Atmos 109:729–736Google Scholar
  18. Kim E, Hopke PK, Edgerton ES (2003) Source identification of Atlanta aerosol by positive matrix factorization. J Air Waste Manag Assoc 53:731–739CrossRefGoogle Scholar
  19. Lanz VA, Alfarra MR, Baltensperger U, Buchmann B, Hueglin C, Prevot ASH (2007) Source apportionment of submicron organic aerosols at an urban site by factor analytical modelling of aerosol mass spectra. Atmos Chem Phys 7:1503–1522CrossRefGoogle Scholar
  20. Larsen RK, Baker JE (2003) Source apportionment of polycyclic aromatic hydrocarbons in the urban atmosphere: a comparison of three methods. Environ Sci Technol 37:1873–1881CrossRefGoogle Scholar
  21. Lee D, Seung H (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791CrossRefGoogle Scholar
  22. Lee E, Chan C, Paatero P (1999) Application of positive matrix factorization in source apportionment of particulate pollutants in Hong Kong. Atmos Environ 33:3201–3212CrossRefGoogle Scholar
  23. Li H, Li Q, Shi Y (2017) Determining the number of factors when the number of factors can increase with sample size. J Econom 197:76–86CrossRefGoogle Scholar
  24. Liu W, Hopke P, Han Y, Yi S, Holsen T, Cybart S, Kozlowski K, Milligan M (2003) Application of receptor modeling to atmospheric constituents at Potsdam and Stockton, NY. Atmos Environ 37:4997–5007CrossRefGoogle Scholar
  25. Muñoz E, Martin ML, Turias IJ, Jimenez-Come MJ, Trujillo FJ (2014) Prediction of PM10 and SO\(_2\) exceedances to control air pollution in the Bay of Algeciras, Spain. Stoch Environ Res Risk Assess 28:1409–1420CrossRefGoogle Scholar
  26. Murillo JH, Roman SR, Marin JFR, Ramos AC, Jimenez SB, Gonzalez BC, Baumgardner DG (2013) Chemical characterization and source apportionment of PM10 and PM2.5 in the metropolitan area of Costa Rica, Central America. Atmos Pollut Res 4:181–190CrossRefGoogle Scholar
  27. Nieto PG, Lasheras FS, García-Gonzalo E, de Cos Juez FJ (2018) Estimation of PM10 concentration from air quality data in the vicinity of a major steelworks site in the metropolitan area of Avilés (Northern Spain) using machine learning techniques. Stoch Environ Res Risk Assess 32(11):3287–3298CrossRefGoogle Scholar
  28. Norris G, Vedantham R, Wade K, Zahn P, Brown S, Paatero P, Martin L (2009) Guidance document for PMF applications with the multilinear engine. Prepared for the US Environmental Protection Agency, Research Triangle Park, NC, by the National Exposure Research Laboratory, Research Triangle Park, NCGoogle Scholar
  29. Paatero P (2000) User’s guide for positive matrix factorization programs PMF2 and PMF3. University of Helsinki, HelsinkiGoogle Scholar
  30. Paatero P, Hopke P (2009) Rotational tools for factor analytic models. J Chemom 23:91–100CrossRefGoogle Scholar
  31. Paatero P, Tapper U (1993) Analysis of different modes of factor analysis as least squares fit problems. Chemom Intell Lab Syst 18:183–194CrossRefGoogle Scholar
  32. Paatero P, Tapper U (1994) Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5:111–126CrossRefGoogle Scholar
  33. Poirot R, Wishinski P, Hopke P, Polissar A (2001) Comparative application of multiple receptor methods to identify aerosol sources in northern Vermont. Environ Sci Technol 35:4622–4636CrossRefGoogle Scholar
  34. Pósfai M, Anderson JR, Buseck PR, Sievering H (1995) Compositional variations of sea-salt-mode aerosol particles from the North Atlantic. J Geophys Res Atmos 100:23063–23074CrossRefGoogle Scholar
  35. Radonić J, Gavanski NJ, Ilić M, Popov S, Očovaj SB, Miloradov MV, Sekulić MT (2017) Emission sources and health risk assessment of polycyclic aromatic hydrocarbons in ambient air during heating and non-heating periods in the city of Novi Sad, Serbia. Stoch Environ Res Risk Assess 31:2201–2213CrossRefGoogle Scholar
  36. Ramadan Z, Song X, Hopke P (2000) Identification of sources of Phoenix aerosol by positive matrix factorization. J Air Waste Manag Assoc 50:1308–1320CrossRefGoogle Scholar
  37. Reff A, Eberly S, Bhave P (2007) Identification of sources of Phoenix aerosol by positive matrix factorization. J Air Waste Manag Asso 57:146–154CrossRefGoogle Scholar
  38. Shao J (1993) Linear model selection by cross-validation. J Am Stat Assoc 88:486–494CrossRefGoogle Scholar
  39. Song Y, Zhang Y, Xie S, Zeng Li, Zheng M, Salmon L, Shao M, Slanina J (2006) Source apportionment of PM2.5 in Beijing by positive matrix factorization. Atmos Environ 40:1526–1537CrossRefGoogle Scholar
  40. Tibshirani R, Taylor J (2012) Degrees of freedom in lasso problems. Ann Stat 40:1198–1232CrossRefGoogle Scholar
  41. Ulbrich IM, Canagaratna MR, Zhang Q, Worsnop DR, Jimenez JL (2009) Interpretation of organic components from positive matrix factorization of aerosol mass spectrometric data. Atmos Chem Phys 9:2891–2918CrossRefGoogle Scholar
  42. United States Environmental Protection Agency (2017) Positive matrix factorization model for environmental data analyses.
  43. Wang H, Shooter D (2005) Source apportionment of fine and coarse atmospheric particles in Auckland, New Zealand. Sci Tot Environ 340:189–198CrossRefGoogle Scholar
  44. Wang X, Zong Z, Tian C, Chen Y, Luo C, Li J, Luo Y (2017) Combining positive matrix factorization and radiocarbon measurements for source apportionment of PM2.5 from a national background site in North China. Sci Rep 7:10648CrossRefGoogle Scholar
  45. Zekri H, Mokhtari AR, Cohen DR (2016) Application of singular value decomposition (SVD) and semi-discrete decomposition (SDD) techniques in clustering of geochemical data: an environmental study in central Iran. Stoch Environ Res Risk Assess 30:1947–1960CrossRefGoogle Scholar
  46. Zeng X, Xia Y (2018) Selection of the number of factors in factor models. Manuscript, Department of Statistics and Applied Probability, National University of SingaporeGoogle Scholar
  47. Zhang L, Liu Y, Zhao F (2018) Singular value decomposition analysis of spatial relationships between monthly weather and air pollution index in China. Stoch Environ Res Risk Assess 32:733–748CrossRefGoogle Scholar
  48. Zong Z, Wang X, Tian C, Chen Y, Qu L, Ji L, Zhang G (2016) Source apportionment of PM2.5 at a regional background site in North China using PMF linked with radiocarbon analysis: insight into the contribution of biomass burning. Atmos Chem Phys 16:11249–11265CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Mei Yan
    • 1
  • Xiaojie Yang
    • 2
  • Weiqiang Hang
    • 3
  • Yingcun Xia
    • 1
    • 3
    Email author
  1. 1.School of Mathematical SciencesUniversity of Electronic Science and Technology of ChinaChengduChina
  2. 2.School of Mathematics, Statistics and PhysicsNewcastle UniversityNewcastleUK
  3. 3.Department of Statistics and Applied ProbabilityNational University of SingaporeSingaporeSingapore

Personalised recommendations