Seasonal Disorder in Urban Traffic Patterns: A Low Rank Analysis


This article proposes several advances to sparse nonnegative matrix factorization (SNMF) as a way to identify large-scale patterns in urban traffic data. The input to our model is traffic counts organized by time and location. Nonnegative matrix factorization additively decomposes this information, organized as a matrix, into a linear sum of temporal signatures. Penalty terms encourage this factorization to concentrate on only a few temporal signatures, with weights which are not too large. Our interest here is to quantify and compare the regularity of traffic behavior, particularly across different broad temporal windows. In addition to the rank and error, we adapt a measure introduced by Hoyer to quantify sparsity in the representation. Combining these, we construct several curves which quantify error as a function of rank (the number of possible signatures) and sparsity; as rank goes up and sparsity goes down, the approximation can be better and the error should decreases. Plots of several such curves corresponding to different time windows leads to a way to compare disorder/order at different time scalewindows. In this paper, we apply our algorithms and procedures to study a taxi traffic dataset from New York City. In this dataset, we find weekly periodicity in the signatures, which allows us an extra framework for identifying outliers as significant deviations from weekly medians. We then apply our seasonal disorder analysis to the New York City traffic data and seasonal (spring, summer, winter, fall) time windows. We do find seasonal differences in traffic order.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19


  1. Asif MT, Kannan S, Dauwels J, Jaillet P (2013) Data compression techniques for urban traffic data. In: 2013 IEEE symposium on computational intelligence in vehicles and transportation systems (CIVTS), pages 44–49. IEEE

  2. Ahmadi P, Kaviani R, Gholampour I, Tabandeh Mahmoud (2015) Modeling traffic motion patterns via non-negative matrix factorization. In 2015 IEEE international conference on signal and image processing applications (ICSIPA), pages 214–219. IEEE

  3. Alonso-Mora J, Samaranayake S, Wallar A, Frazzoli E, Rus D (2017) On-demand high-capacity ride-sharing via dynamic trip-vehicle assignment. In: Proceedings of the National Academy of Sciences, page 201611675

  4. Ban XJ, Hao P, Sun Z (2011) Real time queue length estimation for signalized intersections using travel times from mobile sensors. Trans Res Part C 19(6):1133–1156

    Article  Google Scholar 

  5. Boquet G, Morell A, Serrano J, Vicario JL (2020) A variational autoencoder solution for road traffic forecasting systems: missing data imputation, dimension reduction, model selection and anomaly detection. Trans Res Part C 115:102622

    Article  Google Scholar 

  6. Brunet J-P, Tamayo P, Golub TR, Mesirov JP (2004) Metagenes and molecular pattern discovery using matrix factorization. Proc Natl Acad Sci 101(12):4164–4169

    Article  Google Scholar 

  7. Chagoyen M, Carmona-Saez P, Shatkay H, Carazo JM, Pascual-Montano A (2006) Discovering semantic features in the literature: a foundation for building functional associations. BMC Bioinformatics 7(1):41

    Article  Google Scholar 

  8. Chen X, He Z, Sun L (2019) A Bayesian tensor decomposition approach for spatiotemporal traffic data imputation. Trans Res Part C 98:73–84

    Article  Google Scholar 

  9. Cazabet R, Jensen P, Borgnat P (2018) Tracking the evolution of temporal patterns of usage in bicycle-sharing systems using nonnegative matrix factorization on multiple sliding windows. Int J Urban Sci 22(2):147–161

    Article  Google Scholar 

  10. Carmona-Saez P, Pascual-Marqui Roberto D, Tirado Francisco, Carazo Jose M, Pascual-Montano Alberto (2006) Biclustering of gene expression data by non-smooth non-negative matrix factorization. BMC Bioinformatics 7(1):78

    Article  Google Scholar 

  11. Carrasco DR, Tonon G, Huang Y, Zhang Y, Sinha R, Feng B, Stewart JP, Zhan F, Khatry D, Protopopova M et al (2006) High-resolution genomic profiles define distinct clinico-pathogenetic subgroups of multiple myeloma patients. Cancer cell 9(4):313–325

    Article  Google Scholar 

  12. Deri JA, Moura JMF (2015) Taxi data in new york city: a network perspective. In: 2015 49th asilomar conference on signals, systems and computers, pages 1829–1833, Nov 2015

  13. Donovan B Mori A, Agrawal N, Meng Y, Lee J, Work D (2016) New York City hourly traffic estimates (2010-2013).

  14. Dueck D, Morris Quaid D, Frey BJ (2005) Multi-way clustering of microarray data using probabilistic sparse matrix factorization. Bioinformatics 21(suppl-1):i144–i151

    Article  Google Scholar 

  15. Donovan B, Work Daniel B (2015) Using coarse GPS data to quantify city-scale transportation system resilience to extreme events. arXiv preprint arXiv:1507.06011

  16. Djenouri Y, Zimek A, Chiarandini M (2018) Outlier detection in urban traffic flow distributions. In 2018 IEEE international conference on data mining (ICDM), pages 935–940

  17. Ermagun A, Levinson D (2018) Spatiotemporal traffic forecasting: review and proposed directions. Trans Rev 38(6):786–814

    Article  Google Scholar 

  18. Ferreira N, Poco J, Vo HT, Freire J, Silva CT (2013) Visual exploration of big spatio-temporal urban data: a study of new york city taxi trips. IEEE Trans Vis Comput Graph 19(12):2149–2158

    Article  Google Scholar 

  19. Gao Y, Church G (2005) Improving molecular cancer class discovery through sparse non-negative matrix factorization. Bioinformatics 21(21):3970–3975

    Article  Google Scholar 

  20. Guan X, Chen C, Work D (2016) Tracking the evolution of infrastructure systems and mass responses using publicly available data. PloS one 11(12):e0167267

    Article  Google Scholar 

  21. Geroliminis N, Daganzo CF (2008) Existence of urban-scale macroscopic fundamental diagrams: some experimental findings. Trans Res Part B 42(9):759–770

    Article  Google Scholar 

  22. Guo J, Huang W, Williams BM (2015) Real time traffic flow outlier detection using short-term traffic conditional variance prediction. Trans Res Part C 50:160–172

    Article  Google Scholar 

  23. Gong Y, Li Z, Zhang Jian, Liu W, Zheng Y, Kirsch C (2018) Network-wide crowd flow prediction of sydney trains via customized online non-negative matrix factorization. In: Proceedings of the 27th ACM international conference on information and knowledge management, pages 1243–1252. ACM

  24. Hofleitner A, Herring R, Bayen A, Han Y, Moutarde F, De La Fortelle A (2012) Large scale estimation of arterial traffic and structural analysis of traffic patterns using probe vehicles. In Transportation Research Board 91st Annual Meeting (TRB’2012)

  25. Han Y, Moutarde F (2011) Analysis of network-level traffic states using locality preservative non-negative matrix factorization. pages 501–506, 10

  26. Han Y, Moutarde F (2013) Statistical traffic state analysis in large-scale transportation networks using locality-preserving non-negative matrix factorisation. IET Intell Trans Syst 7(3):283–295

    Article  Google Scholar 

  27. Han Yufei, Moutarde Fabien (2016) Analysis of large-scale traffic dynamics in an urban transportation network using non-negative tensor factorization. Int J Intell Trans Syst Res 14(1):36–49

    Google Scholar 

  28. Hoyer PO (2002) Non-negative sparse coding. In: Neural Networks for Signal Processing, 2002. Proceedings of the 2002 12th IEEE Workshop on, pages 557–565. IEEE

  29. Hoyer PO (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5(Nov):1457–1469

    MathSciNet  MATH  Google Scholar 

  30. Herman R, Prigogine I (1979) A two-fluid approach to town traffic. Science 204(4389):148–151

    MathSciNet  Article  Google Scholar 

  31. Ronald H, Steffen R, Bernd S (2000) C*-algebras and numerical analysis. CRC Press, Boca Raton

    Google Scholar 

  32. Ito K, Ito M, Miyazaki K, Tanimoto K, Sezaki K (2017) Data analysis on train transportation data with nonnegative matrix factorization. In: 2017 IEEE international conference on big data (Big Data), pages 4080–4085. IEEE

  33. Krichene W, Castillo MS, Bayen A (2016) On social optimal routing under selfish learning. In: IEEE transactions on control of network systems

  34. Kim H, Park H (2007) Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics 23(12):1495–1502

    Article  Google Scholar 

  35. Kim H, Park H (2008) Nonnegative matrix factorization based on alternating nonnegativity constrained least squares and active set method. SIAM J Matrix Anal Appl 30(2):713–730

    MathSciNet  Article  Google Scholar 

  36. Kim PM, Tidor B (2003) Subsystem identification through dimensionality reduction of large-scale gene expression data. Genome Res 13(7):1706–1718

    Article  Google Scholar 

  37. Karve V, Yager D, Abolhelm M, Work D, Sowers R NYC Traffic Patterns cSNMF Source Code.

  38. Liu Z, Cao J, Yang J, Wang Q (2017) Discovering dynamic patterns of urban space via semi-nonnegative matrix factorization. In: 2017 IEEE international conference on big data (Big Data), pages 3447–3453. IEEE

  39. Lv Y, Duan Y, Kang W, Li Z, Wang FY (2015) Traffic flow prediction with big data: a deep learning approach. IEEE Trans Intell Trans Syst 16(2):865–873

    Google Scholar 

  40. Li Stan Z, Hou XW, Zhang HJ, Cheng QS (2001) Learning spatially localized, parts-based representation. In: Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, volume 1, pages I–I. IEEE

  41. Li Q, Jianming H, Yi Z (2007) A flow volumes data compression approach for traffic network based on principal component analysis. In: 2007 IEEE intelligent transportation systems conference, pages 125–130

  42. Lee T, Matsushima S, Yamanishi K (2016) Traffic risk mining using partially ordered non-negative matrix factorization. In: 2016 IEEE international conference on data science and advanced analytics (DSAA), pages 622–631. IEEE

  43. Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. In: Advances in neural information processing systems, pages 556–562

  44. Li L, Xiaonan S, Zhang Y, Lin Y, Li Z (2015) Trend modeling for traffic time series analysis: an integrated study. IEEE Trans Intell Trans Syst 16(6):3430–3439

    Article  Google Scholar 

  45. Maher EA, Brennan C, Wen PY, Durso L, Ligon KL, Richardson A, Khatry D, Feng B, Sinha R, Louis DN et al (2006) Marked genomic differences characterize primary and secondary glioblastoma subtypes and identify two distinct molecular and clinical secondary glioblastoma entities. Cancer Res 66(23):11502–11513

    Article  Google Scholar 

  46. Ma X, Li Y, Chen P (2018) Identifying spatiotemporal traffic patterns in large-scale urban road networks using a modified nonnegative matrix factorization algorithm. Journal of Traffic and Transportation Engineering (English Edition)

  47. Mahmassani HS, Williams JC, Herman R (1984) Investigation of network-level traffic flow relationships: some simulation results. Trans Res Record 971:121–130

    Google Scholar 

  48. Nagy AM, Simon V (2018) Survey on traffic prediction in smart cities. Pervasive Mobile Comput 50:148–163

    Article  Google Scholar 

  49. Pavlyuk D (2019) Feature selection and extraction in spatiotemporal traffic forecasting: a systematic literature review. Euro Trans Res Rev 11(1):6

    Article  Google Scholar 

  50. Caltrans Performance Measurement System.

  51. Alberto P-M, Maria CJ, Kieko K, Dietrich L, Pascual-Marqui RD (2006) Nonsmooth nonnegative matrix factorization (nsnmf). IEEE Trans Pattern Anal Mach Intell 28(3):403–415

    Article  Google Scholar 

  52. Paul Pauca V, Piper J, Plemmons RJ (2006) Nonnegative matrix factorization for spectral data analysis. Linear algebra and its applications 416(1):29–47

    MathSciNet  Article  Google Scholar 

  53. Pauca VP, Shahnaz F, Berry MW, Plemmons RJ (2004) Text mining using non-negative matrix factorizations. In :Proceedings of the 2004 SIAM international conference on data mining, pages 452–456. SIAM

  54. Pehkonen P, Wong G, Törönen P (2005) Theme discovery from gene lists for identification and viewing of multiple functional groups. BMC bioinformatics 6(1):162

    Article  Google Scholar 

  55. Lijun S, Kay WA. Understanding urban mobility patterns with a probabilistic tensor factorization framework. Transportation Research Part B: Methodological, 91:511–524

  56. Hongzhi W, Mohamed JB, Mohamed H. Progress in outlier detection techniques: A survey. IEEE Access, 7:107964–108000

  57. Wisconsin worries: Labor rallies in NY.

  58. NY subway system shuts down due to Hurricane Irene (updated).

  59. Xu L, Wang Y, Yu H, Li H (2015) Feature extraction of urban traffic network data based on locally sensitive discriminant analysis algorithm

  60. Yangyang X, Yin W, Wen Z, Zhang Y (2012) An alternating direction algorithm for matrix completion with nonnegative factors. Front Math China 7(2):365–384

    MathSciNet  Article  Google Scholar 

  61. Yang S, Qian S (2019) Understanding and predicting travel time with spatio-temporal features of network traffic flow, weather and incidents. IEEE Intell Trans Syst Mag 11(3):12–28

    Article  Google Scholar 

  62. Zhang Z, He Q, Tong H, Gou J, Li X (2016) Spatial-temporal traffic flow pattern identification and anomaly detection with dictionary-based compression theory in a large-scale urban network. Trans Res Part C 71:284–302

    Article  Google Scholar 

  63. Zheng J Liu HX (2017) Estimating traffic volumes for signalized intersections using connected vehicle data. Trans Res Part C 79:347–362

    Article  Google Scholar 

  64. Yuan Z, Kaan O, Kun X, Hong Y (2016) Using big data to study resilience of taxi and subway trips for hurricanes sandy and irene. Trans Res Record 2599:70–80

    Article  Google Scholar 

  65. Zhan X, Ukkusuri SV, Zhu F (2014) Inferring urban land use using large-scale social media check-in data. Netw Spatial Econ 14(3):647–667.

    MathSciNet  Article  MATH  Google Scholar 

  66. Zhang S, Wang W, Ford J, Makedon Fillia Learning from Incomplete Ratings Using Non-negative Matrix Factorization, pages 549–553

Download references

Author information



Corresponding author

Correspondence to Richard B. Sowers.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors acknowledge the Program for Interdisciplinary and Industrial Internships at Illinois (PI4) and the Illinois Geometry Laboratory  (IGL). The many IGL students who have made invaluable contributions to this work are: Raghav Bakshi, James Kerns, Xinyi Li, Xinyu Liu, Yicheng Pu, Gabriel Shindnes, Haozhe Wang, Jing Wang, Ziying Wang, Yu Wu, Zeyu Wu, Bin Xu, and Dajun Xu. The authors would also like to thank the Siebel Energy Institute for its support of this work. This material is based upon work supported by the National Science Foundation under Grant Numbers CMMI 1727785 and DMS 1345032. This work was also supported by a grant from the Siebel Energy Institute. The code for this work is at

Sandia National Laboratories is a multimission laboratory operated by National Technology and Engineering Solutions of Sandia LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration. Sandia Labs has major research and development responsibilities in nuclear deterrence, global security, defense, energy technologies and economic competitiveness, with main facilities in Albuquerque, New Mexico, and Livermore, California.

This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government.



For completeness, let’s write down the calculations leading to the algorithm of Sect. 2.5.

Writing out \({\mathcal{E}}_{\beta ,\eta }\) of (5), we get

$$\begin{aligned}&{\mathcal{E}}_{\beta ,\eta }(W,H)= \sum _{(t,\ell )\in {\mathcal{I}}}\left| D_{t,\ell }-(WH)_{t,\ell }\right| ^2 \\&\quad + \beta \sum _{n=1}^N\left( \sum _{\ell =1}^L H_{n,\ell }\right) ^2 + \eta \sum _{n=1}^N\sum _{t=1}^T W_{\ell ,t}^2. \end{aligned}$$

We seek to minimize this by alternating between minimization problems in W and H. Namely, if we start with a fixed \((W,H)\in \mathbb {R}^{T\times N}_+\times \mathbb {R}^{N\times L}_+\), we can construct a descent step for the function \({\mathcal{E}}_{\beta ,\eta }(W,\cdot )\) and then, letting \(H'\) be the result, we can construct a descent step for \({\mathcal{E}}_{\beta ,\eta }(\cdot ,H')\). This should decrease the value of \({\mathcal{E}}_{\beta ,\eta }\), and we can then proceed iteratively.

The gradients of \({\mathcal{E}}_{\beta ,\eta }\) in the directions of W and H are given by

$$\begin{aligned} \frac{\partial {\mathcal{E}}_{\beta ,\eta }}{\partial W_{\hat{t},\hat{n}}}(W,H)&=-2 \sum _{\ell : (\hat{t},\ell )\in {\mathcal{I}}}\left( D_{\hat{t},\ell }-\sum _{n=1}^N W_{\hat{t},n}H_{n,\ell }\right) H_{\hat{n},\ell }\\&\quad +\eta W_{\hat{t},\hat{n}}\\&= -2 \left( \left[ D-WH\right] _{\mathcal{I}}H^T+\eta W\right) _{\hat{t},\hat{n}} \end{aligned}$$


$$\begin{aligned} \frac{\partial {\mathcal{E}}_{\beta ,\eta }}{\partial H_{\hat{n},\hat{\ell }}}(W,H)&=-2 \sum _{t: (t,\hat{\ell })\in {\mathcal{I}}}\left( D_{t,\hat{\ell }}-\sum _{n=1}^N W_{t,n}H_{n,\hat{\ell }}\right) W_{t,\hat{n}} \\&\quad + 2\beta \left( \sum _{n=1}^NH_{n,\hat{\ell }}\right) \\&= -2 \left( W^T\left[ D-WH\right] _{\mathcal{I}}\right) _{\hat{n},\hat{\ell }}\\&\quad + 2\beta (\mathbf {1}_{N\times N} H)_{\hat{n},\hat{\ell }}. \end{aligned}$$

As in Kim and Park (2008), we want to iteratively find the critical points of \({\mathcal{E}}_{\beta ,\eta }\), i.e. the solutions of

$$\begin{aligned} \left[ WH\right] _{\mathcal{I}}H^T - [D]_{\mathcal{I}}H^T + \eta W&= 0\\ W^T\left[ WH\right] _{\mathcal{I}}- W^T[D]_{\mathcal{I}}+ \beta \mathbf {1}_{N\times N} H&= 0 \end{aligned}$$

The above formulae suggest a multiplicative descent rule (which need not be gradient descent; see Lee and Seung (2001)). Fix \((W,H)\in \mathbb {R}_+^{T\times N}\times \mathbb {R}_+^{N\times L}\). Assume that

$$\begin{aligned} \frac{\partial {\mathcal{E}}_{\beta ,\eta }}{\partial H_{n,\ell }}>0; \end{aligned}$$

we can then decrease the value of \({\mathcal{E}}_{\beta ,\eta }\) by decreasing \(H_{n,\ell }\). Rewriting (14) as

$$\begin{aligned} -2 \left( W^T\left[ D-WH\right] _{\mathcal{I}}\right) _{n,\ell }+ 2\beta (\mathbf {1}_{N\times N} H)_{n,\ell }>0 \end{aligned}$$

or rather

$$\begin{aligned} \left( W^T\left[ WH\right] _{\mathcal{I}}\right) _{n,\ell }+ \beta (\mathbf {1}_{N\times N} H)_{n,\ell }>\left( W^T\left[ D\right] _{\mathcal{I}}\right) _{n,\ell }, \end{aligned}$$

since W, H, and D all have nonnegative entries, both sides of this equation are nonnegative. This in turn can be written as \({\chi _{n,\ell }^h(W,H)<1}\) where

$$\begin{aligned} \chi _{n,\ell }^h(W,H) \overset{\text {def}}{=}\frac{\left( W^T\left[ D\right] _{\mathcal{I}}\right) _{n,\ell }}{\left( W^T\left[ WH\right] _{\mathcal{I}}\right) _{n,\ell }+ \beta (\mathbf {1}_{N\times N} H)_{n,\ell }}. \end{aligned}$$

Thus, another way to decrease \(H_{n,\ell }\) while still retaining nonnegativity is to multiply it by \(\chi _{n,\ell }^h(W,H)\). Reviewing these steps, we also see that if \(\frac{\partial {\mathcal{E}}_{\beta ,\eta }}{\partial H_{n,\ell }}<0\), we want to increase \(H_{n,\ell }\), and can again multiply by \(\chi _{n,\ell }^h(W,H)\). Finally, if \(\frac{\partial {\mathcal{E}}^\beta }{\partial H_{n,\ell }}=0\) (i.e., we have found a critical point) \(\chi _{n,\ell }^h(W,H)=1\), so multiplying \(H_{n,\ell }\) by \(\chi _{n,\ell }^h(W,H)\) leaves \(H_{n,\ell }\) unchanged.

The update rule for \(W_{t,n}\) is similar. To start, assume that

$$\begin{aligned} \frac{\partial {\mathcal{E}}_{\beta ,\eta }}{\partial W_{t,n}}>0; \end{aligned}$$

then we can decrease \({\mathcal{E}}_{\beta ,\eta }\) by decreasing \(W_{t,n}\). We can rewrite (15) as

$$\begin{aligned} -2 \left( \left[ D-WH\right] _{\mathcal{I}}{\mathcal{I}}H^T+\eta W\right) _{t,n}>0. \end{aligned}$$

We can again rewrite this as the comparison of two nonnegative quantities;

$$\begin{aligned} \left( \left[ WH\right] _{\mathcal{I}}H^T+\eta W\right) _{t,n} >\left( \left[ D\right] _{\mathcal{I}}H^T\right) _{t,n}; \end{aligned}$$

This in turn is equivalent to \(\chi _{t,n}^w(W,H)<1\) where

$$\begin{aligned} \chi ^w_{t,n}(W,H)\overset{\text {def}}{=}\frac{\left( \left[ D\right] _{\mathcal{I}}H^T\right) _{t,n}}{\left( \left[ WH\right] _{\mathcal{I}}H^T+\eta W\right) _{t,n}} \end{aligned}$$

In other words, we can decrease \(W_{t,n}\) by multiplying by \(\chi _{t,n}^w(W,H)\). One can similarly see that if \(\frac{\partial {\mathcal{E}}^\beta }{\partial W_{t,n}}<0\), gradient descent again increases or decreases W with the same sign as multiplying by \(\chi _{t,n}^w(W,H)\).

Our proposed update rule for W and H is now

$$\begin{aligned} W'_{t,n}&=W_{t,n}\chi _{t,n}^w(W,H)\\ H'_{n,\ell }&= H_{n,\ell }\chi _{n,\ell }^h(W,H). \end{aligned}$$

which is equivalent to (6).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Karve, V., Yager, D., Abolhelm, M. et al. Seasonal Disorder in Urban Traffic Patterns: A Low Rank Analysis. J. Big Data Anal. Transp. (2021).

Download citation


  • Traffic
  • Normalization
  • Sparse nonnegative matrix Factorization