Determining the number of factors for non-negative matrix and its application in source apportionment of air pollution in Singapore
- 38 Downloads
The non-negative matrix factorization has been used in many disciplines of research, where the number of factors plays a crucial role. However, a fully data-driven method for determining the number is yet not available in the literature. Based on the fact that the most appropriate number of factors should generate the best prediction, in this paper we propose a selection method using a two-step delete-one-out approach, called twice cross-validation. This method is easy to implement and is fully data-driven. It also works when constraints are imposed on the factorization including the sparsity. Intensive simulations and real data analyses suggest that the proposed method performs well in most cases and can select the number of factors correctly when the number of factors is much less than the dimension of variables and the sample size is reasonably large. As an important application, the proposed method is used for source apportionment of air pollution in Singapore, and provides physically reasonable source profiles.
KeywordsAir-pollution Cross-validation Factor model Non-negative matrix Source apportionment
We are most grateful to the AE and two referees for their valuable comments and constructive suggestions, which have led to a substantial improvement of this paper. YC Xia’s research is partially supported by MOE Tier 1 Grant: R-155-000-193-114, and MOE Grant of Singapore: MOE2014-T2-1-072, and National Natural Science Foundation of China, 11771066.
- Al-Thani H, Koc M, Isaifan RJ (2018) Investigations on deposited dust fallout in Urban Doha: characterization, source apportionment and mitigation. Environ Ecol Res 6:1493–506Google Scholar
- Belis CA et al (2014) European guide on with receptor models air pollution. JRC reference report, European CommissionGoogle Scholar
- Brown S, Hafner H (2005) Multivariate receptor modeling workbook. USEPA, Research Triangle ParkGoogle Scholar
- Hopke P (2000) A guide to positive matrix factorization. In: Workshop on UNMIX and PMF as applied to PM2, vol 5, p 600Google Scholar
- Kim E, Hopke P (2004) Improving source identification of fine particles in a rural northeastern U.S. area utilizing temperature-resolved carbon fractions. J Geophys Res Atmos 109:729–736Google Scholar
- Nieto PG, Lasheras FS, García-Gonzalo E, de Cos Juez FJ (2018) Estimation of PM10 concentration from air quality data in the vicinity of a major steelworks site in the metropolitan area of Avilés (Northern Spain) using machine learning techniques. Stoch Environ Res Risk Assess 32(11):3287–3298CrossRefGoogle Scholar
- Norris G, Vedantham R, Wade K, Zahn P, Brown S, Paatero P, Martin L (2009) Guidance document for PMF applications with the multilinear engine. Prepared for the US Environmental Protection Agency, Research Triangle Park, NC, by the National Exposure Research Laboratory, Research Triangle Park, NCGoogle Scholar
- Paatero P (2000) User’s guide for positive matrix factorization programs PMF2 and PMF3. University of Helsinki, HelsinkiGoogle Scholar
- Radonić J, Gavanski NJ, Ilić M, Popov S, Očovaj SB, Miloradov MV, Sekulić MT (2017) Emission sources and health risk assessment of polycyclic aromatic hydrocarbons in ambient air during heating and non-heating periods in the city of Novi Sad, Serbia. Stoch Environ Res Risk Assess 31:2201–2213CrossRefGoogle Scholar
- United States Environmental Protection Agency (2017) Positive matrix factorization model for environmental data analyses. https://www.epa.gov/air-research/positive-matrix-factorization-model-environmental-data-analyses
- Zeng X, Xia Y (2018) Selection of the number of factors in factor models. Manuscript, Department of Statistics and Applied Probability, National University of SingaporeGoogle Scholar