1 Introduction

As Internet-of-Things (IoT) applications are proliferating rapidly in recent years, the vastly distributed IoT devices and the resultant large volume of sensing data attract lots of research effort and have triggered a wide range of applications, such as smart city, transportation, and agriculture, owing to its capability of completing complex social and geographical sensing applications. Such social and geographical sensing usually requires large amounts of participants (usually IoT devices) to sense the surrounding environment and collect data from the sensing devices to a data sink due to limited computation, storage, and energy support.

Due to the scale of mass data generated in IoT networks, it is difficult to continuously gather the original data from the network, since such collection usually requires considerable effort of communication and storage at intermediate nodes. A traditional way of solving this problem includes wavelet-based collaborative aggregation [1], cluster-based aggregation and compression [2, 3], and distributed source coding [4, 5]. All of them utilize the spatial correlation of device readings among device nodes. But they may meet robustness issues when dealing with cross-domain (temporal and spatial) event readings and behave limited capacity in compression. In recent years, it is suggested that compressed sensing (CS) may benefit the compression in data aggregation scenarios. It avoids introducing excessive computational, communication, and storage overheads at each device. Therefore, it meets the capacity limitation at each sensing device and is viewed as a promising technology for data gathering in IoT networks.

However, compressive sensing is based on constant sparsity, which means a stable/fixed transform basis (though unnecessary to known) is required according to prior information of sensed data. Such a situation hardly holds in real cases, and data with changing sparsity would impact the recovery quality significantly. In order to address this problem, Wang et al. [13] proposed an adaptive data gathering scheme based on CS. The “adaptive” here has twofold meaning: for one, the CS reconstruction becomes adaptive to the sensed data, which is accomplished by the adjustment of autoregressive (AR) parameters in the objective function, and for the other, the number of measurements required to the sensed data is turned adaptively according to the variation of data. To further deal with the varied sensing data, Wang et al. suggested that each time when the reconstruction is accomplished at sink node, the result is approximately evaluated and forms a feedback to the device nodes. The intuition here is that the temporal correlation between historically reconstructed data could help estimate current reconstruction result at sink node. It is notable that compression of original readings with CS-based method [68] or matrix completion-based method [9, 10] will reduce the quality of recovered data at sink node and routing the raw data to sink to preserve the fidelity brings considerable overhead. There is a conflict between high compression ratio and high fidelity.

Data gathering and recovery with event readings is another problem studied in compressive sensing-based data gathering. A well-known method to tackle this problem is to decompose data d into dn+dα, assuming dα is sparse in time domain since the abnormal readings are usually sporadic. However, when environment changes occur, it may result in a significant amount of readings beginning to change, which would further make dα not sparse in spatial domain. Besides, though dn is sparse in spatial domain under a proper basis and dα is sparse in time domain, they are not necessarily sparse at the same time under the same basis. Therefore, it is doubtful that the sparsity of d could preserve cross time and space domain. Furthermore, the proper basis may vary in accordance with different events.

In this paper, we consider the temporal and spatial correlation of the sensed data and provide a low-rank matrix recovery-based data aggregation design, which could compress the data and address the event data gathering problem at the same time. Compared with the existent work of data gathering in device networks, our approach has the following contributions:

  • In IoT sensing and data aggregation scenarios, note that either the fidelity problem or the magnitude problem can be solved well but never both simultaneously. This paper presents a new attempt to tackle both problems at the same time in large-scale IoT networks with diverse time/space-scale events and reduce global-scale communication cost without introducing intensive computation or complicated transmissions at each IoT device.

  • The experiments of this paper on a real environmental IoT sensing network observe that constant sparsity hardly holds in real cases with diverse time/space-scale events, while low-rank property may be true. This observation may provide a fresh vision for research in both compressive sampling applications and IoT sensing and data aggregation scenarios. This paper further generalizes the low-rank-based optimization design to a nuclear norm-based optimization design, to make the proposed approach more general and robust.

  • Theoretical analysis indicates that our matrix recovery-based method is robust over diverse time/space-scale event readings. The extensive experimental results show that event readings are almost kept unaltered under the proposed design and our method outperforms typical compressive sensing [11] in terms of SNR by 10 times (10 db) generally in the meanwhile.

This paper is organized as follows. Section 2 introduces the preliminaries and the network model. Section 3 proposes the data gathering and recovery design. Section 4 analyzes communication overhead of the proposed method with comparison to compressive sensing. Section 5 presents the experimental results with real environmental IoT sensing datasets from [12]. Then, we summarize related work in Section 6. Finally, we give out the discussion in Section 7 and conclude this paper in Section 8.

2 Preliminaries and assumptions

2.1 Matrix recovery

Let X denote the original data, where X is a M×N matrix and is no longer to be sparse even in a proper basis (different from compressive sensing). Let rank(X)=r, where r is assumed to be much smaller than min{M,N}.

According to [13], in order to recover X from a linear combinations of Xij, the number of the combinations needed is no larger than cr(M+N), where c is a constant.

Let A denote a linear map from RM×N space to Rp space; we have the following optimization problem:

$$ \underset{\boldsymbol{X}}{\text{min}} ||\boldsymbol{X}||_{*} \quad \text{s.t.} A\boldsymbol{X} = \boldsymbol{b} $$
(1)

where b is the vector, and ||·|| denotes the nuclear norm (the sum of σii in SVD decomposition). Note that we replace the rank (the number of the nonzero singular values) with the nuclear norm (the sum of the singular values), which makes the problem become a convex optimization problem and be solvable if pCn5/4rlogn (where n=max(M,N) and C is a constant) [13]. Considering noisy measurements, we further modify the problem in the following format:

$$ \underset{\boldsymbol{X}}{\text{min}}\quad \mu ||\boldsymbol{X}||_{*}+ ||A\boldsymbol{X-b}||_{L_{2}} \quad \text{s.t.} A(\boldsymbol{X})=\boldsymbol{b} $$
(2)

2.2 Network model

We consider a participatory IoT sensing network in which a base station (BS) continuously collects data from participatory IoT devices. Due to the scale of mass data and the limited ability in computation, storage, bandwidth, and energy at each device, these devices need to compress data with light computation overhead before data transmission. Suppose there are N resource-constrained IoT devices in the network, whose positions can be determined after deployment via a self-positioning mechanism such as those proposed in [1416]. Then, the data collection path could be predetermined by the base station and be aware by each device. We further assume that the clocks of all nodes are loosely synchronized [1719]. In particular, t1,t2,⋯,ti,⋯,tj,⋯ are used to represent the time instants in the network, where ti<tj given i<j, i,jZ+. Every time instance, a device generates a reading. \(\mathcal {N}(u)\) is denoted as the n nearest neighbors of an open neighborhood of u. Note that \(\mathcal {N}(u)\) could be the one-hop neighborhood or any neighboring area containing more nodes.

In this paper, we assume that participatory IoT devices follow a semi-honest model [20]. Specifically speaking, they are honest and follow the protocol properly except that they may record intermediate results. We assume that the messages are securely transmitted within the network, which can be achieved via conventional symmetric encryption and key distribution schemes.

3 Temporal-spatial compressive sampling design

3.1 Problem formulation

Given M time instances and N devices in the network, the original data in the network can be represented by a m×n matrix X, where each row represents the readings in the network at a time instant and each column represents the readings of an IoT device at a different time instant. Xij(1≤iM,1≤jN) denotes the reading of each node j at time instant ti.

Let A denote a linear map from RM×N space to Rp space and vec denote a linear map to transform a matrix to a vector by overlaying one column on another; we have

$$A(\boldsymbol{X})=\boldsymbol{\Phi} \cdot vec(\boldsymbol{X}) $$

where Φ is a p×MN matrix. Let Φ be a random matrix satisfying the RIP condition [21]. Before deployment, each device is equipped with a pseudo-random number generator. Once the device produces a reading at some time instance, the pseudo-random number generator will generate a random vector of length p with the combination of current time instance and the device’s ID as random seed. The elements of this random vector is i.i.d. sampled from a Gaussian distribution of mean 0 and variance 1/p. Note that this pseudo-random number generation at each device could be reproduced by the base station by using the same generator.

p, the dimension of A, means the number of elements (namely the combinations) to recover X. Typically, p should be not less than cr(3m+3n−5r) [13]. Therefore, the problem can be formulated as the following optimization problem:

$$ \underset{\boldsymbol{X} \in R^{M\times N}}{\text{min}}\quad \frac{1}{2} ||A(\boldsymbol{X})-b||^{2}_{F}+ \mu ||\boldsymbol{X}||_{*} \quad \text{s.t.} A{\boldsymbol{X}}=\boldsymbol{b} $$
(3)

where the first part is for noise and the second part is for low rank.

Remark 1

In IoT networks, devices may produce erroneous readings due to noisy environment or error-prone hardware. The erroneous readings usually occur at sporadic time and locations and thus may have few impacts on the data sparsity of the network. Thus, outlier/abnormal reading recovery/detection could still work in compressive sensing-based data gathering. However, device measurements on the same event usually have strong inter-correlations and geographically concentrated in a group of devices in close proximity. Such events may spread in diverse time and space scale and result in dynamic sparsity of the data, which would further violate the assumption of constant sparsity in compressive sensing and thus lead to poor recovery.

Remark 2

Given N Mdim signal vectors generated from N devices within M time instances, a good basis to make these vectors sparse may not be easy to find. Interestingly, [22] has analyzed different sets of data from two independent device network testbeds. The results indicate that the N×Mdim data matrix may be approximately low rank under various scenarios under investigation. Therefore, such N×M temporal-spatial signal gathering problem with diverse scale event data that cannot be well addressed by CS method could be tackled under the low-rank frameworkFootnote 1.

3.2 Path along compressive collection

In this paper, we provide a generalization of current data gathering methods on temporal-spatial signals with diverse scale events, during which device readings are compressively collected along the relay paths, e.g., chain-type or mesh topology, to the sink.

At each device sj, given the reading produced from sj at time instance t1, sj generates a random vector Φ1j of length p, with time instance t1 and its ID sj as the seed, and computes the vector X1jΦ1j. At the next time instance t2, sj generates a random vector Φ2j, computes X2jΦ2j, and adds it to the previous vector X1jΦ1j. At time instance tM, sj computes XMjΦMj and would have the summation \(S_{j}=\sum \limits ^{M}_{i=1} \boldsymbol {X}_{ij}\Phi _{ij}\).

In the network, each device sj continuously updates its vector sum Sj till time instance tM. After that, device sj relays the vector Sj to the next device si. Then, si adds Sj with its vector sum Si and forwards Si+Sj to the next device. After the collection along the relay paths, the sink receives \(\sum \limits ^{M}_{i=1} \boldsymbol {X}_{ij}\Phi _{ij}\).

Remark 3

During data gathering, each node sends out only one vector of fixed length along the collection path, regardless of the distance to the sink (The property of the fixed-length vector will be discussed in Section 4).

Considering event data, recall that the row of data matrix X (the signal in the network) represents the data acquired at some time instance from all devices and each column of matrix X represents the data got from one device at different time instances.

Outlier readings could come from the internal errors at error-prone devices, for example, noise, systematic errors, or caused by external events due to environmental changes. Former internal errors are often sparse in spatial domain, while the latter readings are usually low rank in time domain. They both keep sparse at the corresponding domain but together may lead to dynamic changes of data sparsity.

Let matrix X be decomposed into two parts, the normal one and the abnormal one: X=Xn+Xs. We could have:

$$ \begin{aligned} A\boldsymbol{X}&=A\left(\boldsymbol{X}_{n}+\boldsymbol{X}_{s}\right)\\ &=A\cdot[\!I,I]\left(\boldsymbol{X}_{n},{X}_{s}\right)^{T}\\ &=[\!A,A]\left[\boldsymbol{X}_{n},\boldsymbol{X}_{s}\right]^{T} \quad \text{s.t.} A(\boldsymbol{X})=\boldsymbol{b}\\ \end{aligned} $$
(4)

Based on Eq. 1, [A,A] is a new linear map. The formulated problem could be solved in the framework of matrix recovery. That is, given the observation vector yRp, the original data matrix X could be recovered in R2M×N.

3.3 A basic design of data recovery

This section provides the generalization of data recovery method from compressive sensing to the realm of matrix recovery. The advantages of such an extension are twofold: (1) it exploits the data correlation in both time and space domains and (2) the diverse scale of event data, which would mute the power of CS method due to sparsity changes, could be tackled with the proposed method.

According to Eqs. 3 and 4, the general form of the problem could be expressed with the following minimization problem:

$$ \underset{\boldsymbol{X} \in R^{m\times n}}{\text{min}}\quad \frac{1}{2} ||A(\boldsymbol{X})-b||^{2}_{F}+ \mu ||\boldsymbol{X}||_{*} \quad \text{s.t.} A(\boldsymbol{X})=\boldsymbol{b} $$
(5)

where A(x)=ΦT(x) given T(·) as the transformation of a matrix to a vector by overlaying one column of x on another. Φ is a p×MN random matrix.

Note that Eq. 3 is the Lasso form of Eq. 2. In relaxed conditions, its solution is the solution of Eq. 2 [23]. Therefore, we consider Eq. 3 (Eqs. 3 and 5 are essentially same) instead of the original problem in Eq. 2.

This problem could be further transformed into the following form:

$$ \underset{\boldsymbol{X} \in R^{m\times n}}{\text{min}}\quad F(x)\triangleq f(x) +P(x) $$
(6)

where \(f(x)=\frac {1}{2} ||A(\boldsymbol {X})-b||^{2}_{F}\) and P(x)=μ||X||

Note that both parts are convex, but only the first part is differential while the second part may not. Then, we could have

$$\nabla f(\boldsymbol{X})= A^{*}(A(\boldsymbol{X})-b) $$

where A is the dual operator of A.

Since A(X)=ΦTX, we have

$$\nabla f(\boldsymbol{X})= A^{*}(A(\boldsymbol{X})-b)=\Phi^{T}(\Phi^{*}T(X)-b)^{*} $$

Because ∇f is linear, it is Lipschitz continuous. Then, we could have a positive constant Lf to satisfy the following inequation:

$$||\nabla f(\boldsymbol{X})-\nabla f(\boldsymbol{Y})||_{F}\leq L_{f} ||\boldsymbol{X}-\boldsymbol{Y}||_{F}\quad\forall \boldsymbol{X},\boldsymbol{Y}\in R^{M\times N} $$

Lemma 1

A rough estimation of Lf

$$\sqrt{MN\cdot \underset{i}{\text{max}}\: \left\{\left(\Phi^{T}\Phi\right)^{2}_{i})\right\}}, $$

where \(\left (\Phi ^{T}\Phi)^{2}_{i}\right)^{2}\) is the ith column of the matrix ΦTΦ.

Proof

\(||\nabla f(\boldsymbol {X})-\nabla f(\boldsymbol {Y})||_{F}^{2}=||\Phi ^{T}(\Phi ^{*}T(\boldsymbol {X}-\boldsymbol {Y}))||^{2}_{2}\)

Set \(\Phi ^{T}\Phi = \left (\begin {array}{ccc} a_{11} & \cdots & a_{1,MN} \\ \vdots & & \vdots \\ a_{p1} &\cdots &a_{p,MN} \end {array}\right)\),

\(T(X-Y)=\left (\begin {array}{c} x_{11} \\ \vdots \\ x_{MN} \end {array}\right)\),

and \(h= \underset {i}{\text {max}}\; \left \{\left (\Phi ^{T}\Phi \right)^{2}_{i}\right)\),then

$$\begin{aligned} ||\Phi^{T} \Phi T(X-Y) ||^{2}_{2}&=\sum\limits^{p}_{j=1}\left(\sum\limits^{MN}_{i=1} a_{ji}x_{i}\right)\\ &\leq h(x_{1}+\ldots+x_{MN})^{2}\\ &\leq M\cdot N \cdot h(x_{1}+\ldots+x_{MN})^{2}\\ &=MNh||X-Y||^{2}_{F} \end{aligned} $$

Thus, \(L_{f}\leq \sqrt {MNh}\). □

Remark 4

A much smaller Lf could be found in various real scenarios and may help converge quickly. The experimental results of this paper show that the Lf could be much smaller than the rough estimation above, given the matrix sampled from a Gaussian distribution.

Considering the following quadratic approximation of F(·) of Eq. 6 at Y:

$$ \begin{aligned} Q_{\tau}(X,Y)&\triangleq f(Y)+<\nabla f(Y),X-Y>\\ &\quad+\frac{\tau}{2}||X-Y||^{2}_{F} +P(X)\\ &= \frac{\tau}{2}||X-G||^{2}_{F}+P(X)+f(Y)\\ &\quad-\frac{1}{2\tau}||\nabla F(Y)||^{2}_{F}\\ \end{aligned} $$
(7)

where τ>0 is a given parameter, G=Yτ−1f(Y).

Since the above function of X is strong convex, it has a unique global minimizer.

Considering the minimization problem

$$ \underset{X\in R^{M\times N}}{\text{min}}\quad \frac{\tau}{2}||X-G||^{2}_{F}+\mu||X||_{*} $$
(8)

where GRM×N. Note that if G=Yτ−1A(A(Y)−b), then the above minimization problem is a special case of Eq. 7 with \(f(X)=\frac {1}{2}||A(X)-b||^{2}_{2}\) and P(X)=μ||X|| when we ignore the constant term.

Let Sτ(G) denote the minimizer of (6). According to [24], we further have

$$S_{\tau}(G)=U\cdot diag((\delta-\mu/\tau)_{+})\cdot V^{T} $$

given the SVD decomposition of G=Yτ−1A(A(Y)−b)=U·diag(δVT. Here, for a given vector xRp, we let x+=max{x,0} where the maximum is taken component-wise.

Based on the accelerated proximal gradient(APG) design given [13, 24], we further denote t0=t1=1 and τk=Lf and {Xk},{Yk},{tk} as the sequence generated by APG. For i=1,2,3,⋯, we have

  • Step 1: Set \(Y_{k}=X_{k}+\frac {t^{k-1}-1}{t^{k}}\left (X_{k}-X_{k-1}\right)\)

  • Step 2: Set Gk=Yk−(τk)−1A(A(Yk)−b). Compute \(S_{\tau _{k}}(G_{k})\) from the SVD of Gk

  • Step 3: Set \(X^{k+1}=S_{\tau _{k}}(G_{k})\)

  • Step 4: Set \(t_{k+1}=\frac {1+\sqrt {1+4(t_{k})^{2}}}{2}\)

Lemma 2

For any μ>0, the optimal solution X of Eq. 3 is bounded according to [13, 24]. And ||X||F<χ where

$$ \chi = \left\{ \begin{array}{ll} min\left\{||b||^{2}_{2}/(2\mu), ||X_{LS}||_{*}\right\} & \text{if A is surjective}\\ ||b||^{2}_{2}/(2\mu) & \text{Otherwise} \end{array} \right. $$
(9)

with XLS=A(AA)−1b

Based on this lemma, we could reach a deterministic estimation of the procedure and speed of convergence of data recovery.

Let {Xk},{Yk},{tk} be the sequence generated by APG. Then, for any k≥1, we could have

$$F(X_{k})-F(X^{*})\leq \frac{2L_{f}||X^{*}-X_{0}||^{2}_{F}}{(k+1)^{2}} $$

Thus,

$$F(X_{k})-F(X^{*})\leq \varepsilon \quad \text{if}\quad k\geq \sqrt{\frac{2L_{f}}{\varepsilon}}(||X_{0}||_{F}+\chi)-1. $$

Let δ(x) denote dist(0,(f(x))+μ||X||), where δ(x) represents the convergence speed of data recovery. It is easy to see that the process naturally stops when δ(x) is small enough.

Since ||X|| is not differential, it may not be easy to compute δ(x). However, there is a good upper bound for δ(x) provided by APG designs [24].

Given

$$\begin{aligned} \tau_{k}(G_{k}-X_{k+1})&=\tau_{k}(Y_{k}-X_{k+1})-\nabla f(Y_{k})\\ &=\tau_{k} (Y_{k}-X_{k+1})\\ &\quad-\Phi^{T}(\Phi\cdot vec(Y_{k})-b) \end{aligned} $$

Note that

$$\partial (\mu ||X_{k+1}||_{*}) \geq \tau_{k}(G_{k}-X_{k+1}) $$

let

$$\begin{aligned} S_{k+1}&\triangleq \tau_{k}(Y_{k}-X_{k+1})+\nabla f(X_{k+1})-\nabla f(Y_{k})\\ &= \tau_{k}(Y_{k}-X_{k+1}) +A^{*}(A(X_{k+1})-A(Y_{k}))\\ &=\tau_{k}(Y_{k}-X_{k+1})+\Phi^{T}(\Phi \cdot T(Y_{k}-X_{k+1})) \end{aligned} $$

we could have

$$S_{k+1} \in \partial (f(X_{k+1})+\mu ||X_{k+1}||_{*}) $$

Therefore, we have δ(Xk+1)≤||Sk+1||.

According to the derivation above, the stopping condition could be given as follows,

$$\hspace{45pt} \frac{||S_{k+1}||_{F}}{\tau_{k} \text{max}\{1,||X_{k+1}||_{F}\}}\leq Tol $$

where Tol is a tolerance defined by user, usually moderately small threshold.

4 Advanced design of data recovery

This section provides a generalization of previous low-rank-based matrix recovery design to a nuclear-form-based design. Suppose X0 denotes an M×N matrix with rank r given the singular value decomposition (SVD) UΣV, where MN, Σ is r×r, U is M×r, and V is N×r.

Let subspace T denote the set of matrices of the form UY+XV, where X (Y) is an arbitrary M×r (N×r) matrix. UY and XV are both M×N matrices. The span of UY and XV have dimension of Mr and Nr, respectively, and the intersection of two spans has dimension of r2. Therefore,

$$d_{T} = dim(T) = r(M + N-r) $$

Let T denote the subspace of matrices spanned by the family (xy) and x and y denote arbitrary vectors orthogonal to U and V, respectively. Note that the spectral norm ||·|| is dual to the nuclear norm. We have the subdifferential of the nuclear norm at X0

$$\partial ||\boldsymbol{X}_{0}||_{*}=\{\boldsymbol{Z} : P_{T}(\boldsymbol{Z})=\boldsymbol{UV^{*}} and ||P_{T^{\perp}}(\boldsymbol{Z}) ||\leq 1\} $$

where UV∗ is equal to \(\sqrt r\) under the Euclidean norm.

Theorem 1

Given X0, an arbitrary M×N rank-r-matrix, and ||·||, the matrix nuclear norm, considering a Gaussian mapping Φ with mc·r(3M+3N−5r) for some c>1, the recovery is exact with probability at least 1−−2e(1−c)n/8, where n=max(M,N) [13].

Here the Gaussian mapping Φ is an M×N random matrix with i.i.d., zero-mean Gaussian entries with variance 1/p. It adopts a linear operator where \([\Phi (Z)]_{i} = \text {tr}(\Phi _{i}^{*}\cdot Z)\).

By stacking the vector(column) of Z on top of one another, Φ could be equivalently written by a p×(MN) dimensional matrix. Then, we have the dual multiplier

$$Y = \Phi^{*}\cdot\Phi_{T} (\Phi^{*}_{T}\cdot\Phi_{T})^{-1}(UV^{*}) $$

Remark 5

According to this theorem, each device sends out only one vector of fixed length of cr(3M+3N−5r) along the collection path at the end of time M, with an overwhelming recovery probability of the original data.

Given p<(M+Nr)r, we could always find two distinct matrices Z and Z0 of rank at most r with the property A(Z)=A(Z0), no matter what A is. Let URM×r,VRN×r be two matrices with orthogonal columns, considering the linear space of matrices

$$T = \left\{UX^{*}Y V^{*} : X \in R^{N\times r}, Y \in R^{N*r}\right\} $$

Note that the dimension of T is r(M+Nr); if p<(M+Nr)r, there exists Z=UXYV=0 in T such that Φ(Z)=0, since we have Φ(UX)=Φ(YV) for two distinct matrices of rank at most r. Interestingly, different from the results in compressive sensing, the number of measurements required is within a constant of a theoretical lower limit—No extra log factor.

Comparing with compressive sensing (CS)-based data gathering design, the length of vector sent by each device at each time instance with CS -based design is O(logN). Based on recent results on the bounds for low-complexity recovery models [25], the total amount of vectors collected during all M time instances will be O(MN log(N)) in compressive sensing.

When M is larger than O(N/ log(N)), the proposed design will exhibit advantage in communication overhead. When M and N have the same order of magnitude, the proposed method has similar communication overhead compared with CS-based method.

Before estimating the recovery error and its upper bound, we first introduce the restricted isometry property (RIP):

Definition 1

Let r=1,2,…,n, the isometry constant δr of A is the smallest quantity such that

$$(1-\delta_{r}) || X ||^{2}_{F}\leq || A(X) ||^{2}_{2}\leq (1 +\delta_{r}) || X ||^{2}_{F} $$

holds for all matrices of rank at most r.

If δr is bounded by a sufficiently small constant between 0 and 1, we say that A satisfies the RIP at rank r.

Theorem 2

Suppose X is the solution of the recovery method. Given the noise z satisfying ||Φ(z)||≤ε and ||ΦT(z)||η, for some εη, if \(\delta _{r}<\frac {1}{3}\) with r≤2, then ||XX||F≤(ε+η)+

Based on this theorem, given a random matrix Φ properly chosen from i.i.d. zero-mean Gaussian distribution with variance 1/p, the error of the proposed method could be bounded under the noise.

Based on the above analysis, we could see that (1) the vector kept by each device is bounded by cr(3M+3N−5r) and (2) the communication overhead of the network, i.e., the total number of message, is O(Ncr(3M + 3N − 5r)). Comparing with the overhead of CS-based data gathering method, which are M·(2cs logN + s) and MN·(2cs logN+s), respectively, for low-rank data [25], it is easy to see that the proposed method could outperform CS-based data gathering methods in terms of communication given a large collection period M. To compare with matrix completion-based method, we take STCDG proposed in [22] as an example under the same assumption that N nodes are deployed randomly. According to [13], the overhead of STCDG can be derived as O(n5/4N1/2r logn)(n=max(M,N)). STCDG may suffer a much larger overhead compared with our method under large-scale IoTs.

Remark 6

According to the analysis, the larger the sampling period M at each device, the better the communication overhead efficiency the proposed method has.

5 Results

To evaluate data recovery quality and robustness of the proposed method, we conduct experiments on both artificial datasets and real sensor datasets. Artificial datasets are constructed by a 100×100 matrix representing a random deployed sensor network of 100 nodes within a 100-h duration. The real sensor datasets are extracted from CitySee project [12], which has deployed a large-scale wireless sensor network consisting of multiple sub-networks in a urban area in Wuxi, China. Specially, we compare the proposed method with a compressive sensing (CS)-based method proposed in [11] on the temperature and humidity data generated from 55 sensors in 115 h.

The CS-based method proposed in [11] generates sampling matrix randomly and keeps original readings sparse in DCT domain. To detect abnormal readings, [11] decomposes the original reading d=d0+ds where d0 contains the normal readings and ds contains the deviated values of abnormal readings and constructs a sparse basis for d=[d0,ds]. The sink reconstructs sensor readings with linear programming (LP) techniques [26]. We generate sampling matrix with the same distribution in CS-based method and our proposed method with sampling rate about 47% on original readings (115×55 matrix).

In data gathering and recovery problem, event readings usually result in dynamic and diverse sparsity changes in both time and space domains, which may seriously undermine the foundation of CS-based method during environment changes. Prior works [6, 2730] have made an attempt to tackle data recovery with small-scale event readings with CS-based methods, e.g., events reported from several devices brought by device accidents or small-range environment change. However, when events spread in large range and various time scales, it is doubtful whether CS-based data gathering method could deal with it or not. In this paper, we conduct the experiments to study the data recovery quality on the data with both small-range events and large-range events on both CS-based method and the proposed method.

5.1 Recovery quality and robustness study on data with large-scale events

As shown in Figs. 1 and 2, the proposed method achieves high recovery quality with large-range event in Figs. 1c and 2c. Event readings are recovered almost exactly the same as the original data in spatial domain. As shown in Fig. 3b, c, although large-range event leads to dynamic and diverse scale of sparsity changes and brings more challenge to data recovery, the proposed method generally achieves about 10 db better recovery quality than that of CS-base method. We further confirm the observation in (1) snapshot in spatial domain and time domain of humidity data with large-range event recovered by the proposed method and CS-based method at the 5th, 25th, and 50th nodes in Figs. 4, 5, and 6, respectively, and (2) snapshot in spatial domain of temperature data with large-range event at 115 h recovered by MR method and CS-based method in Fig. 7.

Fig. 1
figure 1

3D contour map of humidity data with a large-range event. a Original humidity data with a large-range event. b CS-recovered humidity data with a large-range event. c MR-recovered humidity data with a large-range event

Fig. 2
figure 2

3D contour map of temperature data with a large-range event at 115 h. a Original temperature data with a large-range event. b CS-recovered temperature data with a large-range event. c MR-recovered temperature data with a large-range event

Fig. 3
figure 3

SNR comparison of MR and CS methods among different datasets. a SNR of MR- and CS- recovered temperature data. b SNR of MR- and CS-recovered temperature data with a small-range event. c SNR of MR- and CS-recovered temperature data with a large-range event

Fig. 4
figure 4

Comparison of MR and CS methods on humidity data with large-range event at the 5th node. a Data recovered by MR at the 5th node. b Data recovered by CS at the 5th node

Fig. 5
figure 5

Comparison of MR and CS methods on humidity data with large-range event at the 25th node. a Data recovered by MR at the 25th node. b Data recovered by CS at the 25th node

Fig. 6
figure 6

Comparison of MR and CS methods on humidity data with large-range event at the 50th node. a Data recovered by MR at the 50th node. b Data recovered by CS at the 50th node

Fig. 7
figure 7

Comparison of MR and CS methods on temperature data with large-range event. a Data recovered by MR. b Data recovered by CS

In the meanwhile, CS-based method could not recover the data as shown in Figs. 1b and 2b. The recovered data in the event area are almost overwhelmed in the noise due to the changes of the sparsity foundation of CS-based method. And the recovery quality of CS method in other areas (except event area) is affected by event readings due to the violation of static sparsity. Therefore, CS-based method has limited recovery capability and less robustness against large-scale event compared with the proposed method.

5.2 Recovery quality and robustness study on data with small-scale events

As shown in Fig. 8, the humidity data with small-range event is plotted in 3D contour maps. It is obvious that the proposed method recovers the data in high quality. Event readings are easy to observe by the small hill in the map of Fig. 8a, c, while CS-based method can only recover the data to some degree as shown in Fig. 8b, since event readings are recovered in low quality as the recovered data in the event area are almost overwhelmed in the noise. What is worse is that some areas are obviously altered due to the change of sparsity. It is easy to find that CS-based method provides much worse recovery robustness against small-scale event compared with the proposed method.

Fig. 8
figure 8

3D contour map of humidity data with a small-range event. a Original humidity data with a small-range event. b CS-recovered humidity data with a small-range event. c MR-recovered humidity data with a small-range event

As shown in Fig. 3b, our method generally achieves about 10 db better recovery quality than CS-based method under small-scale events. We further confirm the observation in comparison of snapshot in time domain of humidity data with small-range event recovered by the proposed method with that of CS-based method at arbitrary mode respectively in Fig. 9.

Fig. 9
figure 9

Comparison of MR and CS methods on humidity data with small-range event. a Humidity data recovered by MR at an arbitrary node. b Humidity data recovered by CS at an arbitrary node

5.3 Recovery quality and robustness study on data without events

The temperature data and humidity data (original, recovered by CS-based method and recovered by the proposed method) are plotted in a 3D contour map in Figs. 10 and 11. As shown in Figs. 10c and 11c compared with Figs. 10a and 11a, the contour map of data recovered by the proposed method makes little change compared with the original data. As artificial datasets are generated randomly, the 2D contour map in Fig. 12 gives the comparison more obviously. We also plot the temperature data and humidity data in 2D contour maps as shown in Figs. 13 and 14 to make the result more clear. The result is further confirmed on the study of recovery quality quantitatively measured in SNR. Assuming XjRM denoting the reading of the jth node and \(\widehat {X}_{j}\) denoting the recovered reading respectively, SNR of node j in time domain is defined as \({SNR}_{j}=-20\log _{10}\frac {||X_{j}-\widehat {X_{j}}||_{2}}{||X_{j}||_{2}}\). It is shown that our proposed method achieves about 20 db gain in the recovered data in Fig. 3a. We also measure the recovery performance by the root mean square error (RMSE). In time domain, the RMSE of node j is \({RMSE}_{j}=\sqrt {\frac {\sum _{i=1}^{M}{\left (\widehat {X_{ij}}-X_{ij}\right)^{2}}}{M}}\). In spatial domain, the RMSE of time slot i is \({RMSE}_{t_{i}}=\sqrt {\frac {\sum _{j=1}^{N}{\left (\widehat {X_{ij}}-X_{ij}\right)^{2}}}{N}}\). RMSE measurement on temperature and humidity data is shown in Figs. 15 and 16 which indicates that our method brings less error than CS method.

Fig. 10
figure 10

3D contour map of temperature data. a Original temperature data. b CS-recovered temperature data. c MR-recovered temperature data

Fig. 11
figure 11

3D contour map of humidity data. a Original humidity data, b CS-recovered humidity data. c MR-recovered humidity data

Fig. 12
figure 12

2D contour map of artificial data. a Original artificial data. b CS-recovered artificial data. c MR-recovered artificial data

Fig. 13
figure 13

2D contour map of temperature data. a Original temperature data. b CS-recovered temperature data. c MR-recovered temperature data

Fig. 14
figure 14

2D contour map of artificial data. a Original contour data. b CS-recovered contour data. c MR-recovered contour data

Fig. 15
figure 15

RMSE comparison of MR and CS methods on temperature data. a RMSE in spatial domain. b RMSE in time domain

Fig. 16
figure 16

RMSE comparison of MR and CS methods on humidity data. a RMSE in spatial domain. b RMSE in time domain

As shown in Fig. 10b, compressive sensing-based method can recover the temperature data in some degree. It is interesting to observe that CS-based method could hardly keep recovery quality stable, while the proposed method can achieve much better recovery quality as well as robustness. It can be further confirmed with the comparison of the SNR result of both methods at each sensor. The proposed method outperforms CS-based method in SNR with about 10 times (10 db) as shown in Fig. 3a.

6 Related work

In device networks, data gathering usually results in considerable communication overhead. Traditional approaches dealing with such problem include distributed source coding [31, 32], in-network collaborative wavelet transform [3335], holistic aggregation [36], and clustered data aggregation and compression [37, 38]. Though these approaches to some extent utilize the spatial correlation of device readings, they lack the ability to support the recovery of diverse-scale events.

In the past decade, compressive sensing (CS) has gained increasing attention due to its capacity of sparse signal sampling and reconstruction [39, 40] and triggered a large variety of applications, ranging from image processing to gathering geophysics data [41].

In terms of data gathering, various CS-based approaches have been proposed to the decentralized data compression and gathering of networked devices, aiming to efficiently collect data among a vast number of distributed nodes [6, 27, 28]. Liu et al. [7] present a novel compressive data collection scheme for IoT sensing networks adopting a power-law decaying data model verified by real data sets. Zheng et al. [8] propose another method handling with data gathering in IoT sensing networks by random walk algorithm. Xie and Jia [42] develop a clustering method that uses hybrid CS for device networks reducing the number of transmissions significantly. Li et al. [43] apply compressive sensing technique into data sampling and acquisition in IoT sensing networks and Internet of Things (IoT). Mamaghanian et al. [44] propose the potential of the compressed sensing for signal acquisition and compression in low complexity ECG data in wireless body device networks (WBSN). Zhang et al. [29] propose a compressive sensing-based approach for sparse target counting and positioning in IoT sensing networks. Tian and Giannakis [45] utilize compressed sensing technique for the coarse sensing task of spectrum hole identification. In addition, there are several papers researching in CS for device network focusing on throughput, routing, video streaming processing, and sparse event detection in [30, 4648].

Cheng et al. [49] focus on dealing with continuous sensed data. Extracting kernel or dominant dataset from big sensory data in WSN provides another compressing method in [50, 51].

In recent years, low-rank matrix recovery (LRMR) extends the vectors’ sparsity to the low rank of matrices, becoming another important method to obtain and represent data after CS given only incomplete and indirect observations [10]. Keshavan et al. compared the performance of three matrix completion algorithms based on low-rank matrix completion with noisy observations [52]. Zhang et al. [53] present a spatio-temporal compressive sensing framework on Internet traffic matrices. Yi et al. [9] take advantage of both the low-rankness and the DCT compactness features improving the recovery accuracy. Compared with prior work based on LRMR, our method achieves better compression ratio and lower communication overhead.

7 Discussion

According to the analysis and experimental study, it is interesting to observe that the proposed method enables IoT networks the ability of dealing with both fidelity problem and magnitude problem simultaneously with diverse time/space-scale events and reduce global-scale communication cost without intensive edge computation.

The experiments of this paper on a real environmental IoT sensing network also reveal that constant sparsity hardly holds in real cases with diverse time/space-scale events, while low-rank property may be true. While events may violate constant sparsity in compressive sensing and reduce the recovery quality severely, the recovery quality of the proposed method still keeps the fidelity of event readings, which is about 10 times (10 db) better than typical compressive sensing [11] in terms of SNR. This observation may provide a fresh vision for research in both compressive sampling applications and IoT sensing and data aggregation scenarios.

However, it is worth noting that there is still limitation in the cases that low rank property does not hold in the network. To deal with this problem, this paper further generalizes the low-rank-based optimization design to a nuclear norm-based optimization design, to make the proposed approach more general and robust. In future work, we would like to focus on enhancing the performance of our method in IoT networks with events.

8 Conclusion

In this paper, we have shown the effectiveness and validity of cross-domain matrix recovery in data compression, gathering, and recovery through the study on environmental IoT sensing datasets. It is obvious that the proposed method could be further extended to a large variety of other IoT application scenarios. In particular, we have demonstrated the capacity of the proposed MRCS method dealing with both data fidelity and magnitude problems simultaneously in data gathering of IoT networks, via both theoretical analysis and experimental study. The results show that the proposed MRCS method outperforms the original CS method in terms of recovery quality. Our work provides a new approach in both compressive sampling applications and IoT networks with diverse time/space-scale events and suggests a general design given the relaxation from low-rank-based optimization to nuclear norm-based optimization.