1 Introduction

Spatio-temporal data covering a wide area of a city have become available due to the commoditization of sensor-monitoring systems and mobile-phone networks. These monitoring systems observe various types of data, such as vehicle transportation counts on a road network, bike-renting counts of a bike-sharing system, and the purchasing records of shops around a city, where missing values often appear due to the failure of sensor nodes, data transmission errors, and trouble with data recording systems. We can find rich and bounteous information in such spatio-temporal data. However, it becomes difficult to grasp what spatio-temporal activities appeared in the data at a glance. Therefore, understanding of such activities via pattern extractions is a typical problem of spatio-temporal data analysis, in which the interpretability of the extracted patterns is regarded as one of the most important property for analysis methods.

Tensor factorization methods have been widely applied to discover spatial and temporal patterns from various kinds of spatio-temporal data [17]. These methods represent spatio-temporal data as a higher-order dimensional array, called a tensor that is a generalization of a matrix. For example, we can represent spatio-temporal data as a three-way tensor whose first, second, and third modes correspond to sensor locations, timestamps for 24 h, and the observed days. We illustrated an example of a tensor for spatio-temporal data analysis in Fig. 1. With this formulation, we can naturally incorporate an assumption that daily or weekly periodicity can be found in data and similar spatial patterns appear on different days. We can extract a few numbers of spatial, temporal, and daily patterns into latent factors by decomposing the tensor. However, since most existing tensor factorization methods do not consider the non-negativity of data where observations only contain non-negative values, they often result in messy and hard to interpret factors.

Fig. 1.
figure 1

Example for a non-negative tensor factorization method on analysis of a traffic-flow data set, where latent patterns for location, time, and day modes are extracted in the latent factor.

Unlike those tensor factorization methods, Non-negative Tensor Factorization (NTF) [8], which leverages non-negativity, is effective for extracting interpretable patterns from the non-negative data [13, 18]. This method has successively yielded interpretable factors from various kinds of spatio-temporal data, such as location-based social network services [14, 25], mobile phone GPS logs [10], log messages of network equipment [16], and traffic records of road networks [32]. However, NTF was not applicable to the existence of missing values. To deal with missing values, NTF was recently extended to learn the latent factors from a subset of elements in a tensor, called the non-negative tensor completion [15, 31]. With this NTF, we can interpolate missing values in data by learned latent factors. However, NTF methods for the missing value completion problem suffer from overfitting when just a few observations are available. Because they ignore spatial and temporal contextual information such as the order of time stamps, weekly periodicity, the distances between sensor locations and treats each feature of the tensor independently.

To incorporate such contextual information, most matrix/tensor factorization methods have employed a graph Laplacian based regularizer for encouraging the latent factors to be smooth with spatio-temporal dependencies [21]. The graph regularized non-negative matrix factorization [6] is a variant of such schemes and has been widely utilized in many applications, however, it does not consider scenarios where missing values exist and analyzing higher-order dimensional arrays.

Another choice for representing such auxiliary information is structured regularizers [2] that have become popular in the fields of machine learning, signal processing, and data mining [7, 28]. For example, the fused lasso [27], which is also known as the total variation, approximates parameters by piecewise-constant values with the order of parameters. Since its estimated parameters have the same estimated value, this is beneficial for finding segments of parameters. In a pioneering work [29], the penalized matrix decomposition was proposed to utilize the fused lasso as a regularizer on latent factors and was applied to a gene data analysis problem. They presented latent factors easy to find gene segments rather than existing matrix factorization methods incorporated the lasso regularizer. However, this method and its subsequent works have only considered the fused lasso without incorporating more general structured regularizer such as spatial dependencies of sensors, and also ignored the non-negative properties and the existence of missing values.

In this paper, we attempt to solve a problem of extracting latent factors from spatio-temporal data where a lot of missing values exists. To tackle this problem, we propose a novel NTF that learns factors by employing spatial and temporal auxiliary information as regularizers. We utilize this information to represent phenomena often appear in spatio-temporal data, such as counts of vehicles passed roads smoothly grow or decrease or take the same value along with space and time. To exploit such information, we introduce a regularizer that consists of both a graph-based Laplacian regularizer and structured regularizers that incorporate not only the order of features but also more general graph and group based structures [3, 24]. With our regularizer, we can utilize various kinds of auxiliary information into NTF including a daily and weekly periodicity, distances between sensor locations, and areas of locations. Our proposed method is highly robust to the presence of a large portion of missing values because it encourages latent factors to be smooth and flat with spatial and temporal structures, where we regard segments of parameters that take the same value as flat. To estimate the latent factors for our proposed method, we present an efficient optimization procedure of the alternating direction method of multipliers [4] that utilizes simple proximity operators of the conjugate gradient method [21] and a parametric network flow algorithm [12].

We conducted missing value interpolation experiments with real-world traffic flow data and compared the performance of our proposed method with existing NTF methods. We demonstrate that our proposed method improved the interpolation performances from existing NTF methods. We also show that our extracted factors were interpretable to detect change points. Because our factors have segments, we can easily find a boundary of segments as a change point.

2 Non-negative Tensor Factorization

We denote a N-th way non-negative tensor as \(\mathcal {{X}} \in \mathbb {R}_{\ge 0}^{{I_1} \times \cdots \times {I_N}}\), where \(I_n\) is the number of features in the n-th mode. The n-th mode unfolding of a tensor \(\mathcal {{X}}\) is denoted as \(\mathcal {{X}}_n\). We use \(i = (i_1,\dots ,i_N)\) and \(D\) to represent an element and the whole set of the elements in the tensor, respectively. A subset of the observed elements in the tensor is denoted by \(\varOmega = \{i \mid x_i ~\text {is observed}~, \forall i \in D \}\).

NTF decomposes the observed values of tensor \(\mathcal {{X}}\) into K latent non-negative factors, where \(K \ll \min (I_1,\dots , I_N)\). The n-th mode factor matrix is denoted as \({\varvec{A}^{(n)}} \in \mathbb {R}_{\ge 0}^{{I_n} \times {K}}\) whose k-th column is factor vector \(\varvec{a}_k^{(n)}\in \mathbb {R}_{\ge 0}^{I_n}\). We denote a whole set of factor vectors as \(A = \{\varvec{a}^{(n)}_k \mid \forall (n, k)\}\). An estimation for element \(x_i\) is given by a sum of latent factor vectors \(\hat{x}_i = \sum _{k=1}^K a^{(1)}_{i_1,k} a^{(2)}_{i_2,k} \cdots a^{(n)}_{i_N,k} \in \hat{\mathcal {{X}}}\). We denote the transpose operator as \(^\mathrm{{\top }}\), the Khatri-Rao product as \(\odot \), and its series as \(\odot _{n=1}^N {\varvec{A}^{(n)}} = {\varvec{A}^{(1)}} \odot \dots \odot {\varvec{A}^{(N)}}\).

The empirical loss function for NTF can be defined as a sum of divergences that indicates a discrepancy between \(x_i\) and its estimation \(\hat{x}_i\):

$$\begin{aligned} f(A) = D_{\varOmega }(\mathcal {{X}}\Vert \hat{\mathcal {{X}}})+ \sum _{n=1}^N \sum _{k=1}^K g^{(n)}({\varvec{a}^{(n)}_k}), \end{aligned}$$
(1)

where \(D_{\varOmega }(\mathcal {{X}}\Vert \hat{\mathcal {{X}}}) = \sum _{i \in \varOmega } d (x_{i} \Vert \hat{x}_i)\). We \(d(p\Vert q)\) to denote a divergence between scalars p and q, and \(g^{(n)}\) to denote a penalty function for the n-th mode factor vector. Because loss function f is non-convex with respect to \(A\), an NTF problem is to obtain a local minimizer \(A^*\) of the loss under a non-negative constraint:

$$\begin{aligned} A^* = \mathop {\mathrm{\arg min}}\limits _{A} f(A) \text { subject to }~ \varvec{a}^{(n)}_{k}\ge 0~~ \forall (n, k).&\end{aligned}$$
(2)

The graph regularized non-negative matrix factorization method [6] employs a graph Laplacian regularizer [22] to represent the smoothness in latent factors. An adjacency matrix for the n-th mode features is denoted as \(\varvec{W}^{(n)}\in \mathbb {R}^{{I_n} \times {I_n}}\) that represents a graph whose nodes and capacities of edges correspond to the features of the n-th mode and the similarity measures between the two features, respectively. The Laplacian matrix can be denoted as \({\varvec{L}^{(n)}} = {\varvec{D}^{(n)}} - {\varvec{W}^{(n)}}\), where \({\varvec{D}^{(n)}}\) is a diagonal matrix whose elements are the sums of each row of \({\varvec{W}^{(n)}}\). Then a graph Laplacian regularizer can be defined:

$$\begin{aligned} g^{(n)}({\varvec{a}^{(n)}_k}) = {{\varvec{a}^{(n)}_k}}^\mathrm{{\top }}{\varvec{L}^{(n)}} {\varvec{a}^{(n)}_k}. \end{aligned}$$
(3)

This regularizer penalty function encourages smoothness because its formulation equals putting a weighted quadratic term on the difference between the adjacency elements.

3 Proposed Model

We introduce a unified structured regularizer to employ both smooth and piecewise-constant properties with auxiliary structures:

$$\begin{aligned} g^{(n)}(\varvec{a}^{(n)}_{k}) = \sum _{{m}=1}^{3}\lambda _m g^{(n)}_m(\varvec{a}^{(n)}_{k}) + g^{(n)}_{\ge 0}(\varvec{a}^{(n)}_{k}), \end{aligned}$$
(4)

where \(\lambda _1,\lambda _2\) and \(\lambda _3\) are the hyperparameters for each regularizer. We employ a Generalized Fused Lasso (GFL) [5, 30] and a Higher-Order Fused Lasso (HOFL) [24] as \(g^{(n)}_1\) and \(g^{(n)}_2\), respectively. \(g^{(n)}_3\) corresponds to the Laplacian regularizer for extracting smooth patterns. We use an indicator function for the non-negative region:

$$\begin{aligned} g^{(n)}_{\ge 0}(\varvec{a}^{(n)}_{k}) = {\left\{ \begin{array}{ll} 0 &{}(\text {if}\ a_{i,k} \ge 0,~ \forall i)\\ +\infty &{}(\text {otherwise}) \end{array}\right. }. \end{aligned}$$
(5)

The GFL penalty is defined:

$$\begin{aligned}&g_1(\varvec{a}^{(n)}_{k}) = \sum _{{j}=1}^{I_n}\sum _{{j'}=1}^{I_n} w^{(n)}_{j,j'} \left| a^{(n)}_{j , k} - a^{(n)}_{j' , k}\right| . \end{aligned}$$
(6)

The GFL prefers parameters with the same value if they are adjacent on the given graph, such as distances between sensor locations and temporal lags between time stamps. The HOFL encourages parameters in a given group to take identical values [24]. With this regularizer, we can utilize auxiliary information, such as sensors placed in a specific area that may output similar values and a group of time stamps when a specific train leaves from a station. We denote the r-th group of features in the n-th mode as \(g^{(n)}_r \subseteq D_n\) and a set of groups by \(\mathcal {G}^{(n)}=\{g^{(n)}_1, \cdots , g^{(n)}_{R_n}\}\), where \(D_n\) and \(R_n\) are a set of elements in the n-th mode and the number of groups, respectively. The weights of each element for the r-th group on the n-th mode are denoted by \(c^{(n)}_{r,m} = \bar{c}^{(n)}_{r,m} ~\text {if}~~ m \in g^{(n)}_r\), and \(0 ~\text {otherwise}\), where \(\bar{c}^{(n)}_{r,m} > 0\). Then a simplified HOFL penalty \(g_2(\varvec{a}^{(n)}_{k})\) is given:

$$\begin{aligned} \sum _{r=1}^R \sum _{m=1}^{I_n} c^{(n)}_{r,j_m}| a^{(n)}_{j_m , k} - \bar{a}^{(n)}_{r,j_m,k} | + \theta ^{(n)}_r(a^{(n)}_{s_r , k}-a^{(n)}_{t_r , k}), \end{aligned}$$
(7)

where \(\theta ^{(n)}_r > 0\) is a hyperparameter that controls the consistency of the parameters in a group. \(\bar{a}^{(n)}_{r,k}\) is defined as \(\bar{a}^{(n)}_{r,m,k} = a^{(n)}_{s_k , k}~(\text {if}~m\ge s_k),~a^{(n)}_{t_k , k}~(\text {if}~m\le t_k)~\text {and}~a^{(n)}_{j_m , k}~(\text {otherwise})\) for distinct indices \(j_1,j_2,\ldots ,j_{I_n}\in D_n\) that correspond to a permutation that arranges the entries of \(\varvec{a}^{(n)}_{k}\) in a non-increasing order. Thresholding indices \(s_r\) and \(t_r\) are given as \(s_k = \textstyle \min \big \{m' \mid \sum _{m=1}^{m'} c^{(n)}_{r,j_m} \ge \theta ^{(n)}_r \big \} ~~\text {and}~~ t_k = \textstyle \min \big \{m' \mid \sum _{m= m'}^{I_n} c^{(n)}_{r,j_m} < \theta ^{(n)}_r \big \}\).

For convenience, we denote \(\bar{g}^{(n)}(\varvec{a}^{(n)}_{k}) = \sum _{{m}=1}^{2} \lambda _m g^{(n)}_m(\varvec{a}^{(n)}_{k}) + g^{(n)}_{\ge 0}(\varvec{a}^{(n)}_{k})\). By adopting our structured regularizers to the loss of NTF, we define the following minimization problem for our purpose:

$$\begin{aligned} A^* = \mathop {\mathrm{\arg min}}\limits _{A} D_{\varOmega }(\mathcal {{X}}\Vert \hat{\mathcal {{X}}}) + \sum _{{n}=1}^{N}\sum _{{k}=1}^{K} \bar{g}^{(n)}(\varvec{a}^{(n)}_{k}) + \lambda _3 g^{(n)}_{3}(\varvec{a}^{(n)}_{k}).&\end{aligned}$$
(8)

Note that when \(\lambda _1 = \lambda _2 = \lambda _3 = 0\), our method is reduced to an original NTF. When \(\lambda _2 = \lambda _3 = 0\), our method can be regarded as a tensor extension of the graph-regularized non-negative matrix factorization. Our method includes those methods as special cases.

4 Parameter Estimation

We present an efficient parameter estimation procedure for obtaining a local minimizer of our proposed method. We employ a scaled formulation of the Alternating Direction Method of Multipliers (ADMM) for NTF [15]. The minimization problem for our proposed method can be rewritten:

$$\begin{aligned}&\min _{A,\mathcal {{Z}}} D_{\varOmega }(\mathcal {{X}}\Vert \mathcal {{Z}}) + \sum _{{n}=1}^{N}\sum _{{k}=1}^{K} \bar{g}^{(n)}({\varvec{b}^{(n)}_k}) + g^{(n)}_3({\varvec{a}^{(n)}_k}) \nonumber \\&~\text {subject to}~ \mathcal {{Z}} = \hat{\mathcal {{X}}}, {\varvec{a}^{(n)}_k} = {\varvec{b}^{(n)}_k} ~(\forall n, k), \end{aligned}$$
(9)

where \(\mathcal {{Z}}\) and \({\varvec{b}^{(n)}_k}\) are auxiliary variables, and \(P_{\varOmega }\) is a projection function that only retains the divergence of the observed elements. To solve our problem efficiently with keeping both constraints and separability, we define an augmented Lagrangian for our problem:

$$\begin{aligned}&L_{\rho }(A, B, \mathcal {{Z}}) = D_{\varOmega }(\mathcal {{X}}\Vert \mathcal {{Z}}) + \frac{\rho }{2}\Vert \mathcal {{Z}} - \hat{\mathcal {{X}}} + \mathcal {{U}} \Vert _{\mathcal {{F}}}^2 \nonumber \\&\sum _{{n}=1}^{N}\sum _{{k}=1}^{K} \bar{g}^{(n)}({\varvec{b}^{(n)}_k}) + g^{(n)}_3({\varvec{a}^{(n)}_k}) + \frac{\rho }{2}\big \Vert {\varvec{a}^{(n)}_k} - {\varvec{b}^{(n)}_k} + {\varvec{u}^{(n)}_k}\big \Vert _2^2, \end{aligned}$$
(10)

where \(\mathcal {{U}}\) and \({\varvec{u}^{(n)}_k}\) are Lagrangian multipliers, and \(\rho \) is a step-size parameter, respectively. We summarize the minimization procedure for our proposed method in Algorithm 1. The minimization for ADMM can be efficiently calculated if a simple minimization operator for each of each \({\varvec{a}^{(n)}_k}\) and \({\varvec{b}^{(n)}_k}\) exists.

figure a

The loss function with respect to \({\varvec{A}^{(n)}}\) and \({\varvec{b}^{(n)}_k}\) contains the graph Laplacian regularizer and the non-separable graph-based and group-based penalties, respectively. Thus the main difficulty with our proposed method lies in the minimization of \({\varvec{A}^{(n)}}\) and \({\varvec{b}^{(n)}_k}\), whose minimization problems can be rewritten:

$$\begin{aligned}&{\varvec{A}^{(n)}} = \mathop {\mathrm{\arg min}}\limits _{{\varvec{A}^{(n)}}} \frac{\rho }{2} \Vert \bar{\mathcal {{Z}}}_n - {\varvec{A}^{(n)}} {\varvec{V}_n}^\mathrm{{\top }}\Vert _2^2 + \frac{\rho }{2} \Vert {\varvec{A}^{(n)}} - \bar{\varvec{V}}_n\Vert _2^2 + \lambda _3 \sum _{{k}=1}^{K} g^{(n)}_3({\varvec{a}^{(n)}_k}) \end{aligned}$$
(11)
$$\begin{aligned}&{\varvec{b}^{(n)}_k} = \mathop {\mathrm{\arg min}}\limits _{{\varvec{b}^{(n)}_k}} \bar{g}^{(n)}({\varvec{b}^{(n)}_k}) + \frac{\rho }{2} \Vert \bar{\varvec{v}}^{(n)}_k -{\varvec{b}^{(n)}_k} \Vert _2^2, \end{aligned}$$
(12)

where \(\bar{\mathcal {{Z}}} = \mathcal {{Z}} + \mathcal {{U}}\), \(\varvec{V}_n = \odot _{n=n'}^N \varvec{A}^{(n)}\), \(\bar{\varvec{V}}_n = {\varvec{B}^{(n)}} - {\varvec{U}^{(n)}}\), and \(\bar{{\varvec{v}^{(n)}_k}} = {\varvec{a}^{(n)}_k} + {\varvec{u}^{(n)}_k}\). We efficiently solve the minimization of Eq. (11) by using the fact that it corresponds to the loss function of the graph regularized alternating least squares [21], which approximately runs in is the number of non-zero elements).

The minimization problem in Eq. (12) corresponds to the calculation of the proximity operator, which is defined as: . We present a minimization procedure for Eq. (12) by leveraging the properties of the proximity operator, and obtaining a minimizer for the sum of the non-negative indication function and other convex functions by the following property [26]: . Thus, if we have a minimizer for \(\lambda _1g^{(n)}_1 +\lambda _2 g^{(n)}_2\), we can attain the exact minimizer for \(\bar{g}^{(n)}\) by setting negative parameters to zeros. A minimizer for \(\lambda _1g^{(n)}_1 +\lambda _2 g^{(n)}_2\) can be simply calculated by employing a submodular function minimization procedure. Because the penalty functions of GFL and HOFL are the Lovász extensions [19] of the graph-representable submodular functions [11], we can attain a minimizer for the sum of functions \( \lambda _1g^{(n)}_1 +\lambda _2 g^{(n)}_2\) by an efficient parametric network flow algorithm [7, 24, 30]. We show the details of our minimization procedure for this function in the appendix.

5 Related Works

There has been a lot of articles in which NTF was applied to analyze spatio-temporal data. Kimura et al. proposed a special NTF that decomposes a three-way tensor into two-factor matrices and a three-mode tensor for extracting log messages related to network failures [16]. Yang et al. proposed a combination of NTF without regularizers and post-processing for modeling user activities [32]. Takeuchi et al. proposed an NTF that simultaneously decomposes multiple tensors to extract patterns appeared among different tensors [25]. NTF was used to extract spatio-temporal patterns from human-flow data [10]. However, all of those methods did not employ regularizers into NTF and their methods were not applicable to missing values. One exception is a paper of Sun and Axhausen [23], in which they proposed a probabilistic non-negative Tucker decomposition for discovering interactions among factors. However, they did not incorporate the spatial and temporal structures into regularizers. Han and Moutarde proposed an extension of NTF for predicting future observations [14]. However, they did not consider spatial structures. Our method can be applied to their framework to utilizing spatial and temporal regularizers. The estimation procedures of them were based on the multiplicative update rule and EM algorithm. Our proposed method can utilize graphs and groups of spatial and temporal features to regularize parameters and also employ ADMM as an estimation procedure.

6 Experiments

We conducted missing value completion problems with a traffic flow data set provided by City Pulse [1] and two bike-sharing system data sets recorded in Washington D.C.Footnote 1 and New YorkFootnote 2 [1].

The traffic flow data consist of the numbers of cars that passed at 419 locations every thirty minutes in Arhus City, Denmark. We picked 30 days from August 2nd to 31st 2014, and constructed three-way tensor \(\mathcal {{X}} \in \mathbb {R}^{48 \times 30 \times 441}\) whose modes corresponded to 48 daily time points, 30 days, and 441 observation locations, respectively. From the bike-sharing system data in Washington D.C. and New York, we employed 15 days from April 1st to the 15th with 351 and 344 bike stations. We constructed three-way tensors \(\mathcal {{X}} \in \mathbb {R}^{24 \times 15 \times 351}\) and \(\mathcal {{X}} \in \mathbb {R}^{24 \times 15 \times 44}\) whose values were the numbers of bikes returned to the station in an hour. For the time mode, we utilized the adjacency of the time points as a graph. For the day mode, we employed the adjacency of days and the days of the week as a graph and groups. For the location mode, we used the inverse of the Euclid distance of GPS locations and clusters attained by k-nearest neighbors (\(k=5,10\)) for a graph and groups.

We exploited the Euclid distance as the divergence in experiments. We compared our proposed method (Proposed 1) and our proposed method with only the graph Laplacian regularizer (Proposed 2, \(\lambda _1=\lambda _2=0\)) with NTF estimated by ADMM [15] (ADMM), NTF with the graph Laplacian regularizer [6] estimated by a multiplicative update rule considering missing values (Multi+Lap) [9], and NTF estimated by the multiplicative update rule (Multi). We set the proportion of observations to \(p=\{0.1, 0.01, 0.005, 0.001\}\). By five-fold cross validation, we selected K and other hyperparameters from \(K=\{3, 5, 10\}\) and \(\{0.1, 1, 10\}\). We utilized the normalized RMSE (NRMSE) and the normalized deviation (ND) as error measurements:

$$\begin{aligned} \mathrm{NRMSE} = \sqrt{(1/|\varOmega |)\sum _{(p,t)\in \varOmega }(x_{p,t} - \hat{x}_{p,t})^2} / Q,\end{aligned}$$
(13)
$$\begin{aligned} \mathrm{ND} = (1/|\varOmega |)\sum _{(p,t)\in \varOmega }|x_{p,t} - \hat{x}_{p,t}| / Q, \end{aligned}$$
(14)

here \(Q=(1/|\varOmega |)\sum _{(p,t)\in \varOmega }|x_{p,t}|\). We ran our experiments five times with randomly selected different missing values.

Table 1. NRAME for the traffic flow data of our proposed method (Proposed 1), our proposed method with the graph Laplacian regularizer (Proposed 2), NTF estimated by ADMM (ADMM), NTF with the graph Laplacian regularizer estimated by the multiplicative update rule (Multi+Lap), and NTF (Multi)
Table 2. NRAME for the bike-sharing record data of Washington D.C.
Table 3. NRAME for the bike-sharing record data of New York
Table 4. ND for the traffic flow data of our proposed method (Proposed 1), our proposed method with the graph Laplacian regularizer (Proposed 2), NTF estimated by ADMM (ADMM), NTF with the graph Laplacian regularizer estimated by the multiplicative update rule (Multi+Lap), and NTF (Multi)
Table 5. ND for the bike-sharing record data of Washington D.C.
Table 6. ND for the bike-sharing record data of New York

The results are shown in Tables 1, 2, 3, 4, 5, and 6, where the left and right values in a cell correspond to the average and the standard deviation of those values. We confirmed that our proposed methods showed the best performance in every setting. Our proposed method was robust to the appearance of a large portion of missing values for every data set \(p=\{0.01, 0.005, 0.001\}\). Our proposed method with both the graph-based Laplacian and structured regularizer (Proposed 1) showed better or competitive performance with our proposed method with the graph-based Laplacian regularizer (Proposed 2). Furthermore, our proposed method with the graph-based Laplacian regularizer (Proposed 2) always outperformed the same model estimated by the multiplicative update rule (Multi+Lap). This result was caused by the benefits of simultaneously combining graph-based and structured regularizers with graph and group structures. Thus our proposed model and parameter estimation procedure both contributed to the improvements on missing value interpolations. The existing methods resulted in poor performances with settings where a large portions of tensor elements were missing.

Fig. 2.
figure 2

Time factors of Proposed 1 on the traffic flow data (Color figure online)

Fig. 3.
figure 3

Time factors of Multi+Lap on the traffic flow data (Color figure online)

Fig. 4.
figure 4

Day factors of Proposed 1 on the traffic flow data (Color figure online)

Fig. 5.
figure 5

Day factors of Multi+Lap on the traffic flow data (Color figure online)

Fig. 6.
figure 6

A spatial pattern of the blue factor of Proposed 1 on the traffic flow data (Color figure online)

Fig. 7.
figure 7

A spatial pattern of the blue factor of Multi+Lap on the traffic flow data (Color figure online)

Fig. 8.
figure 8

Time factors of Proposed 1 on the bike-sharing data of Washington D.C. (Color figure online)

Fig. 9.
figure 9

Time factors of Multi+Lap on the bike-sharing data of Washington D.C. (Color figure online)

Fig. 10.
figure 10

Day factors of Proposed 1 on the bike-sharing data of Washington D.C. (Color figure online)

Fig. 11.
figure 11

Day factors of Multi+Lap on the bike-sharing data of Washington D.C. (Color figure online)

Fig. 12.
figure 12

A spatial pattern of the yellow factor of Proposed 1 on the bike-sharing data of Washington D.C. (Color figure online)

Fig. 13.
figure 13

A spatial pattern of the yellow factor of Multi+Lap on the bike-sharing data of Washington D.C. (Color figure online)

To check the qualitative performances of the interpretability, we showed the extracted factors of proposed method (Proposed 1) and existing NTF with the Laplacian regularizer (Multi+Lap) from traffic flow data in Figs. 2, 3, 4, 5, 6, and 7, where \(p=0.1\). The degree of freedom (DoF) in Figures corresponded the number of segments in a factor matrix. Thanks to the Laplacian and structured regularizers, proposed method extracted the interpretable latent factors in which both smooth and flat properties appeared, whose DoF of parameters in factor matrices were extremely less than that of NTF with the Laplacian regularizer. Our factors with low DoF were easy to find change points. For example, the blue factor had a change at 3 am and gradually grew until 6 am. Then it took the constant values until 3 pm in Fig. 2. This factor also has the same value from day 2 to day 6 and from day 8 to day 13. Thus, we can easily understand that the blue factor in Figs. 2 and 4 corresponded to activity that occurred in weekday during daylight with a spatial pattern in Fig. 6. However, NMF with the Laplacian regularizer resulted in messy factors. We also showed that of bike-sharing data in Washington D.C. in Figs. 8, 9, 10, 11, 12, and 13. Our proposed method also extracted more interpretable patterns than existing NTF. For example, the yellow factor of ours in Fig. 8 had a change point at 8 am. After it had taken a peak at 12 am, it kept the same value from 1 pm to 5 pm. Then its value gradually decreased to zero. The yellow factor in Fig. 10 had the same high value on day 2, 3, 9, and 10. Thus, we confirmed that this factor indicated a weekend afternoon activity with a spatial pattern in Fig. 12. Similar interpretations can be obtained from other factors of ours.

7 Conclusion

In this paper, we proposed a structurally regularized non-negative tensor factorization that incorporated both the graph Laplacian and the structured regularizers on latent factors. For the structured regularizer, we employed the generalized fused lasso and the higher-order fused lasso to represent both graph-based and group-based information in time and space. We introduced a flexible and efficient parameter estimation method based on the alternating direction method of multipliers and showed a proximity operator for our unified structured regularizer. With experiments on a missing value imputation problem of three data sets, we confirmed that our proposed method showed the best quantitative performance and successfully extracted more interpretable latent factors than the existing non-negative tensor factorization methods.