Advertisement

G-skyline query over data stream in wireless sensor network

  • Leigang Dong
  • Guohua Liu
  • Xiaowei Cui
  • Tianyu Li
Open Access
Article
  • 131 Downloads

Abstract

There are much data sampled continuously by sensors in the wireless sensor network. Storing and mining these data can find more potential information and provide help for decision making. As an important technology for data mining and multi-criteria decision, skyline computation can identify the interesting single points for user. In order to analyze the groups of points, the group-based skyline is proposed to query all the Pareto Optimal groups which are not g-dominated by other groups with the same number of points. Existing algorithms about g-skyline can just compute static data. However, data stream is very common in many applications, and it is very important to design algorithm go query g-skyline over data stream. In this paper, we propose new algorithms to compute g-skyline over a data stream. We present sharing strategy and then present two efficient algorithms: point-arriving algorithm and point-expiring algorithm. The experimental results on three kinds of synthetic data and a real stock data show that our algorithms perform efficiently over a data stream .

Keywords

Wireless sensor network Group-based skyline Data stream Sharing strategy 

1 Introduction

The Internet of things is the inter-networking of physical devices, vehicles and other items embedded with different information sensors. These wireless sensors can collect much data from the terminals. The most importance is how to dig useful information from these mass data for special purpose.

As one of the important means of multi-decision making, skyline query plays an important role in the applications of sensor network, data mining and so on. The skyline of a data set includes all the points which are not worse than any other points. Given a data set D with d-dimension, every point q can be written as (q[1], q[2],, q[d]) where q[i] is the ith attribute value of q. Assume that there are two points p = (p[1], p[2],, p[d]) and q = (q[1], q[2],, q[d]) in Rd, we say q dominates p, if q[j] ≤ p[j] for each j and there is at least one j(1 ≤ j ≤ d), q[j] < p[j]. The skyline of D consists of all the points which are not dominated by any other points in D. So the skyline query identifies all the best individual points.

For example, in the application of forest fires monitoring, the wireless sensor nodes can perceive nearby temperature, humidity and smoke density. When the fire happens, the nearby temperature will increase and the humidity will decrease, and the nearby sensors can perceive these changes. So a wireless sensor network can be arranged to monitor fire. For convenience, we assume that the sensors sample data at the same time in this paper. As shown in Fig. 1(a), there is a data set D = {p1, p2,, p11}, and each point represents a sensor node with two attributes: the inverse-temperature and the humidity. Doing the skyline query on the sensor nodes can return the skyline points with lower inverse-temperature and lower humidity, as shown in Fig. 1(b), and these indicate the dangerous areas. We find that the point p6 dominates point p3 because the inverse-temperature and humidity of p6 are smaller than that of p3. The skyline of the dataset D consists of p1, p6 and p11. Therefore, the firemen can quickly identify these dangerous areas and take an earlier action.
Fig. 1

An example of skyline. a Dataset, b skyline

However, the fire forces are limited and we cannot check each area at the same time, so if we intend to select 2 areas to examine, the traditional skyline query will not return the result directly. In order to solve such problem in a better way, the group-based skyline query was proposed. The group-based skyline (G-Skyline for short) query is to identify the best groups not g-dominated by any other groups with the same group size, and paper [1] proposed two algorithms for computing G-Skyline. Different from the traditional skyline, the G-Skyline presents much more useful information in more complexity phenomenons such as wireless sensor network, multi-decision and data mining. In the above example, the G-Skyline groups with 2 points include {p1, p6}, {p1, p11},{p6, p11},{p6, p3}, {p11, p8},{p11, p10}, then the fire force could consider selecting one group from the result.

Although the G-Skyline is very useful, the existing algorithms focus on the static data set. In fact, in the wireless network, each sensor node maybe send the perceived data to receiver at intervals, and the environmental intrusion or node-fault maybe affect the perceived data to be send, therefore the data received is dynamic. We can regard these dynamic data as data stream. In the data stream, each data has its life cycle, and it is only effective in its life cycle, so when a data arrives or expires, the current active data set will changes. Based on the data in Fig. 1, we give an example of data stream in Fig. 2. At first, there are 3 points {p1, p2, p3}, at moment t, a new point p4 arrives, the active points are {p1, p2, p3, p4}, and at moment 2t, an old point p2 expires, now the active points are {p1, p3, p4}. With the dataset changing, the groups set will also change, for example, at first, the groups with 2 points are {p1p2, p2p3, p1p3}, when the point p4 arrives, the groups with 2 points change to {p1p2, p1p3, p1p4, p2p3, p2p4, p3p4}, the groups containing p4 appear, when the point p2 expires, the groups with 2 points change to {p1p3, p1p4, p3p4}, the groups containing p2 are removed.
Fig. 2

An example of data stream

With the groups set changing, the corresponding G-Skyline maybe also change. For example, at first, the G-Skyline groups are {p1p3, p2p3} because p1p2 is dominated by p2p3, while p1p3 and p2p3 are not be g-dominated by any other groups. When p4 arrives, the G-Skyline groups are {p1p3, p1p4, p3p4, p2p3}, we can see that the new point arriving affect the G-Skyline result. When p2 expires, the G-Skyline changes to {p1p3, p1p4, p3p4}, the result shows the old point expiring affect the G-Skyline too. So in order to keep the G-Skyline effective all the time in the data stream, we should update the G-Skyline when the new data arrives or an old data expires.

The naive method to find the G-Skyline over a data stream is to use the existing algorithm PWise in paper [1] directly when the data set changes. However, the data maybe change quickly in the data stream, under this circumstance, repeating PWise computation for the whole data set will need much time cost and much redundant computation. Because when a new data arrives or an old data expires, some G-Skyline groups may not change their status. For example, in Fig. 2, the groups p1p3 and p2p3 are G-Skyline groups at the moment t and 2t. That is to say, in a data stream, when a point arrives or expires, we do not compute all the groups again. So the naive method is not a good idea.

In this paper, we present two algorithms to efficiently find G-Skyline over a data stream in the wireless sensor network. The point-arriving-algorithm can compute the new G-Skyline when a new point arrives, and the point-expire-algorithm will get the new G-Skyline result while an old point expires. In order to improve this two algorithms effectively, we present some pruning theorems to remove the groups which will not affect the new G-Skyline.

We summarize our main contributions in brief as follows.
  • We present the problem of finding the G-Skyline with k points over a data stream in the wireless sensor network. This query will provide much more useful information.

  • We present a sharing strategy to make us compute the new G-Skyline based on the existing result. According to the pruning theorems, lots of groups will be pruned without computation.

  • We propose two algorithms to compute the new G-Skyline over a data stream.

  • The experiments are performed based on three kinds of synthetic data and a real stock data.

Organization

The rest of this paper is organized as follows. In Sect. 2, we review the related work to our research work. In Sect. 3, we firstly give the definition and existing theorems of G-Skyline, then we define the problem of G-Skyline query over a data stream formally. Section 4 presents the two algorithms of G-Skyline over a data stream, and gives relevant examples. In Sect. 5, we show the experimental results and evaluations of our algorithms. At last, we conclude our research work in this paper and propose the future work.

2 Related work

Since Borzsonyi et al proposed the skyline operator [2] in 2001, the skyline has been researched in many filed, and there are many algorithms of skyline have been proposed for different problems. Next, we describe the related work of skyline query.

BNL algorithm and D&C algorithm were proposed in paper [2]. BNL computed the skyline by scanning the whole data set and maintaining a candidate set, D&C returned the final skyline set by computing each sub-set skyline. In the algorithm SFS [3], the skyline was returned after all the data being sorted according to the monotone function. Bitmap [4] firstly mapped each tuple to a m-bit vector, then got the skyline by computing the vectors, but this algorithm was only suitable for the static data set. NN [5] algorithm returned the skyline result by filtering the nearest neighbor points. BBS managed the data set by R-tree, and only the nodes containing result points would be visited.

In addition, there are some new skyline algorithms for the specific environment. The sub-space skyline [6, 7, 8, 9] can compute the skyline by dividing the data set to some sub-space. K-dominant skyline [10, 11] can return the points which are not k-dominated by any other points, it can find much more potential information. Top-k skyline [12, 13, 14, 15, 30] will find the points ranking in the top k position, this algorithm was only suitable for the query within a set limit in volume. In recent years, the skyline query on uncertain data has been studied, in this query, a threshold q was given at first, then the points whose probabilities are larger than q would be returned as probabilistic skyline [16, 17, 18]. [31, 32] introduced the MapReduce technology to compute the skyline efficiently. [19] firstly proposed the skyline query in a data stream, the data in the sliding window were managed in R-tree, and the skyline set was maintained by the interval tree. [20] presented algorithm LOOK-OUT to compute skyline in a data stream. In paper [21], the data in the sliding window were managed based on a multi-layer grid structure. [22] proposed a parallel algorithm for window-based skyline targeting multicores.

Paper [23] returned the top-k composition skyline, however, this paper did not propose the composition skyline formally. [24, 25, 26] defined and researched the group skyline query, they calculated the value of the same attribute of k points to form a group, then compared the dominance relation between the groups using the traditional dominance. The calculate functions commonly used in these work were some aggregate functions, such as, SUM, MAX, and MIN. [27] defined the group dominance concept based on uncertain data. In fact, which aggregate function should be used in practical application is difficult to select, so paper [1] proposed the Pareto optimal groups and group-base skyline which can return all the Pareto optimal solutions. [28, 41] proposed efficient skyline group algorithms based on the algorithms in paper [24, 25] in a data stream, but the skyline group in their algorithms focus on some functions such as SUM, MAX and MIN, and the result of skyline groups under these functions is a subset of our G-skyline groups. The reason is that if group G is dominated by \( G^{{\prime }} \) according to G-skyline definition, this relation is right under SUM function, but the reverse is not true. So these algorithms are not suitable for our G-skyline groups. Paper [42] proposed the skyline algorithms over data stream, but they focus on individual subjects rather than groups. [33, 34, 35, 36, 37, 38, 39, 40] discussed different skyline applications in wireless sensor networks, such as continuous reverse skyline, spatial skyline, distributed dynamic skyline and probabilistic skyline query in the wireless sensor networks.

3 Preparations

In this section, we will present the foundation work, including the relevant definitions, basic theorems and the G-Skyline algorithm in static data that will be used in our paper, then we propose our problem.

3.1 Definitions and theorems

Definition 1 (Skyline)

There are two different points p and q coming from the same data set D, we can say q dominates p, denoted by q \( \prec \) p, if q[j] ≤ p[j] for every j (1 ≤ j ≤ d) and at least one j, q[j] < p[j] (1 ≤ j ≤ d), where p[j] is the jth attribute value. The skyline consist of all the points which are not dominated by any other point in D.

Definition 2 (G-Dominante)

Given a data set D, there are two different groups G1 = {p1, p2,, pk} and G2 = {\( p_{1}^{{\prime }} ,p_{2}^{{\prime }} , \ldots ,p_{k}^{{\prime }} \)}, where each point coming from D, we can say G1 g-dominates G2, if two arrays with k points for G1 and G2 are found, G1 = {pn1, pn2,, pnk}and G2 = {\( p_{m1}^{{\prime }} ,p_{m2}^{{\prime }} , \ldots ,p_{mk}^{{\prime }} \)}, and pnipmi for each i (1 ≤ i ≤ k) and at least one i, pni dominates pmi.

Definition 3 (G-Skyline)

The G-Skyline consist of all the groups with k points which are not g-dominated by any other group with the same size.

Example 1

The G-Skyline is different from skyline, and it is also not same as skyline groups [24, 25, 26]. We take the data in Fig. 1 as an example. Let G1 = {p6, p8, p11} and G2 = {p2, p3, p10}, we can say G1 g-dominates G2 because there are two arrays G1 = {p6, p11, p8} and G2 = {p3, p10, p2} such that p6 \( \prec \) p3, p11 \( \prec \) p10, and p8 \( \prec \) p2. So G2 is not the G-Skyline group, but G1 belongs to G-Skyline because there is no other groups g-dominates G1 with the same size.

Definition 4 (Skyline Layers)

The data set D can be divided to some layers, the layeri is composed by skyline points of (D-\( \bigcup\nolimits_{j = 1}^{i - 1} {layer_{j} } \)), such as layeri = skyline(D-\( \bigcup\nolimits_{j = 1}^{i - 1} {layer_{j} } \)) which is computed recursively until all the points of D are in layers, where layer1 is the traditional skyline of D.

Theorem 1

Given a data set D and the group size k, each point of G-Skyline groups must be in the first k skyline layers.

Theorem 2

If there is a non G-Skyline group G with k points, when another point from G’s tail set is added to it, this new group with k + 1 point also does not belong to the G-Skyline.

Theorem 3

For each point p of a G-Skyline group G, all of p’s parents must be in G too.

3.2 Algorithm in static data

To compute the G-Skyline groups with k points from the given n points, the crude method is to enumerate all the \( \left( {\mathop {}\limits_{k}^{n} } \right) \) groups, and then do the query based on the g-dominance. Obviously doing it like this needs much time cost and storage cost, so paper [1] proposes the PWise (Point-Wise) algorithm to efficiently compute the G-Skyline groups. Next, we will introduce this algorithm in brief.
  1. 1.
    Skyline Layers. Firstly, all the points in D with 2-dimension are sorted with increasing x-coordinate value, then all the points in this order are processed by binary search to compute which layer each point belongs to. Because the Pwise algorithm only compute the G-Skyline for the given group size k, so just the first k skyline layers need to be constructed. The point with minimum y-coordinate in layeri is referred as the tail point of layeri. An example of skyline layers is shown in Fig. 3, and p11 is the tail point of layer1.
    Fig. 3

    The skyline layers

     
  2. 2.
    Construct Directed Skyline Graph (DSG). The DSG is a data structure which reflects the dominance relations between the first k layers. It is constructed based on skyline layers: all the points in D are calculated according to the increasing layers. For every point p, it should be compared with all the points in the previous layers and get their dominance relations, for the points which dominate p, p will be added to their children list, and these points will be added as p’s parents list. An example of DSG is shown in Fig. 4 based on the data in Fig. 1. In order to look clarity, all the indirect dominant relations are omitted, such as p11 \( \prec \) p2.
    Fig. 4

    Directed skyline graph

     
  3. 3.

    Compute G-Skyline. Based on the skyline layers and DSG, the algorithm performs by the classic set enumeration tree search framework. According to Theorem 2, the algorithm firstly prunes the non-G-Skyline groups as soon as possible, because if a group is not the G-Skyline group, it should not be expanded further, then according to Theorem 3, the algorithm prunes the point from the tail set of each node. Finally, the G-Skyline is returned.

     

3.3 Our problem

The points in the static data set are stable, but the points in the data stream are dynamic, each point has its life cycle, and the point is only valid in its life cycle. Thus, the active data set of the data stream is not static, and it will change when a new point arrives or an old point expires. Aiming at this problem, we propose an algorithm to find G-Skyline in the data stream. To the best of our knowledge, this problem is the first time to be considered here, and there has been no algorithm can solve it.

In this paper, we use sliding window to manage the data in the stream. There are two type sliding windows [29]: the one based on time, and another based on count. We focus on the time-based sliding window.

Definition 5 (Sliding Window)

For a time window W, and t is a random moment, when the point p arrives at t, its life cycle can be written as [t, t + W], and the point is only valid in this period, that is to say the point p is added to the active data set at t moment and is deleted from the active data set at t + W monent.

Theorem 4

Given a group G with k points coming from the dataset D, for each point p ∈ G, if all of its parents are in G, or it is the traditional skyline point of D, we can say G is G-Skyline group.

Prove

Assume a group G = {p1, p2,, pk}, each point in G and its parents are in G. If there is another group \( G^{{\prime }} \) = {q1, q2,, qk} can g-dominate G, we can find two permutations that for i∈[1, k], qi \( \prec \) pi, so qi is pi’s parent. From the known condition, we conclude that each qi is in G, G and \( G^{{\prime }} \) have the same items. So there is not such a group which g-dominates G, according to the concept of G-Skyline, we can say G is a G-Skyline group.□

4 G-skyline query over a data stream

In this section, we elaborate the algorithm to compute G-Skyline in the data stream. For convenience, we maintain that: (1) the layer of point p, denoted by p.l, indicating which skyline layer the point p belongs to; (2) each point has the constant life cycle, denoted as [p.tarr, p.texp], while p.tarr means when the point p arrives and p.texp indicates when the point p expires, p.texp = p.tarr + W.

In order to compute G-Skyline effectively, we present the sharing strategy, and based on which we propose two algorithms to find G-Skyline groups in the data stream.

4.1 Sharing strategy

When the point arrives or expires in a data stream, we can update the G-Skyline based on the existing G-Skyline.

Proof

The dynamic of the data stream reflects in two aspects: new point arriving and old point expiring. Here we take point p as an example. Both cases will result in the active data set changing, and the G-Skyline of active data set will also change. But not all of the dominance relationships between the points are affected in these two cases, and only such relations are affected: the dominance relationships between p and its parents, and the dominance relationships between p and its children. According to Theorems 1 and 2, we find that when a new point p arrives, the non-G-Skyline groups are still non-G-Skyline groups, and the G-Skyline groups not containing p’s children are still G-Skyline groups, we should only check the other existing G-Skyline groups and the new groups containing p. When an old point p expires, the existing G-Skyline groups not containing p are still G-Skyline groups, and the existing G-Skyline groups containing p should be deleted, so we should check the status of some non-G-Skyline groups.

That is to say, when the active data set changes, we do not have to computing all the active data, we can compute the new G-Skyline based on the existing G-Skyline.□

In the G-Skyline processing over the data stream, this sharing strategy will prune most of groups which will not affect the new G-Skyline, and help us to compute the G-Skyline quickly in the data stream.

4.2 Computing G-skyline for point arriving

When a new point p arrives, we should firstly check which layer the point p belongs to, then we update the DSG to construct the new relationships between all the points, finally, we compute the G-Skyline based on the sharing strategy. In order to compute the G-Skyline continuously in the data stream, we should compute the skyline layers and the DSG for all the active points rather than the first k skyline layers and the DSG in the PWise [1].

Update the skyline layers

For the existing active points, their skyline layers have been constructed, and the points in each layer have been sorted increasingly by x coordinate. When a new point p arrives, by computing p and the tail point of each layer, we can execute the bin-search to find which layer this new point belongs to, then if layeri.tail does not dominate p and layeri−1.tail dominates p, we can say p belongs to layeri. If p is dominated by the tail point of the last layer, it will belong to a new layer. Then we can compare p with all the points in this layer to determine which position the point p locates.

Example

We show an example of Algorithm 1 in Fig. 5 based on Fig. 1. Assume the active points in the data stream are these 11 points, at this moment, a new point p arrives, so the active data set will change. By updating the skyline layers, we firstly execute the bin-search to find that p locates between layer1 and layer2, then we construct the new skyline layers as shown in Fig. 4. From the new skyline layers, we find that the layers of some points have changed. For example, p8 is in layer2 previously, when p arrives, the point in layer2 dominated by p is only p8, so p8 moves to layer3, at the same time, as the children of p8, p2 and p5 also move to higher layer4, similarly, as the child of p5, p4 moves to layer5.
Fig. 5

New skyline layers

Update the directed skyline graph (DSG)

When the new point p arrives, it changes the skyline layers, because the DSG is built based on the skyline layers, so we should also update the DSG of the active points.

According to the DSG concept, we know that when a new point p arrives, it does not affect other points except for its parents and children. So the DSG updating can be finished in two steps. Firstly, to find p’s parents, we can compare p with each point whose layer is smaller than p’s layer, if a point q is p’s parent, we should not compare p with q’s parents. Secondly, to find all of p’s children, we can compare p with each point whose layer is larger than p’s layer, if a point q is p’s child, we should not compare p with q’s children.

Example

We show an example of DSG updating in Fig. 6 based on Fig. 1. When a new point p arrives, we firstly update the skyline layers, then we can update the DSG to reflect the dominance relationships in real time. Because p lies between layer1 and layer2 in the old skyline layers, we can find p is dominated by p11 in the layer1, p8 in the layer2 is dominated by p, and p8 is the child of p11 previously, so we set p11 as p’s parent, and p8 as p’s child, then the children (p2, p5, p4, p7) of p8 will not be compared with p, and we also find p9 is also p’s child. The new DSG is shown in Fig. 6. We can see that when a new point p arrives, there are some dominance relationships about p added to DSG, but other dominance relationships irrelevant to p will remain unchanged.
Fig. 6

Updating DSG when a new point arrives

Compute G-skyline for a point arriving

After updating the skyline layers and DSG, we can compute the new G-Skyline based on the existing G-Skyline. According to theorem 1, we infer that which layer p belongs to will generate different effect on the new G-Skyline. So we present different solutions on the basis of p’s location as follows.
  1. 1.

    If p.layer > k. According to Theorem 1, the point of G-Skyline groups must in the first k layers, so if p.layer > k, p will not affect the G-Skyline result.

     
  2. 2.

    If p.layer < k. In this case, p has no effect on the non-G-Skyline groups and the G-Skyline groups which do not contain all of p’s parents. However, the point p may only affect the G-Skyline groups which containing all of p’s parents, here we denote these groups as candidate groups.

     

Prove

For the non-G-Skyline groups G1, there must be a group G2 dominating it, when p arrives, G2 still dominates G1, so we can easily find p has no effect on such kind of groups.

For each G-Skyline group (such as G3) not containing all of p’s parents, according to Theorem 3, there must be no child of p existing in G3, so there is no point dominated by p, and p has no effect on the G-Skyline groups not containing all of p’s parents.

For the G-Skyline groups not containing any parent of p, we can easily prove p has no effect on such kind of groups. Finally only the G-Skyline groups containing all of p’s parents should be re-evaluated.□

We call this kind of groups the candidate groups, and we divide the candidate groups into two kinds, and give the different solutions for them. If there is not any G-Skyline group containing all of p’s parents, p will not affect the query result.

Solution 1

For each G-Skyline group G which contains all parents of p except for p’s children, p will not affect its status. We can replace the leaf point of G by p to compose the new group which is G-Skyline group, while this leaf point can not be p’s parent. However, G is still G-Skyline group.

Prove

Assume G is a G-Skyline group not containing p’s children, that is to say, there is not any point in G dominated by p, and there will be no group containing p can dominate G, so p does not affect G’s status, and G is still G-Skyline group. On the other hand, if we replace the leaf point of G by p to compose the new group G’, we can not find another group which can dominate G’ because all of p’s parents are already in G, so G’ is G-Skyline group too.□

Solution 2

For each G-Skyline group containing p’s children, p will affect the its status. We replace the leaf point of G by p to compose the new group which will be G-Skyline group. But, G is not G-Skyline group yet.

Prove

If the G-Skyline group contains p’s children, such as G = {g1, g2,, gi,, gk} and gi is p’s child, we can find \( G^{{\prime }} \) = {g1, g2,, p,, gk} can g-Dominates G because p \( \prec \) gi, so the group G will not be G-Skyline. At the same time, if we replace the leaf point of G by p to compose a new group \( G^{{\prime }} \), then each of the point in \( G^{{\prime }} \) and all of its parents are in the G, so according to concept of G-Skyline and Theorem 4, we can see that there is no group which can g-Dominate \( G^{{\prime }} \), so \( G^{{\prime }} \) is G-Skyline group.□

According to the above solutions, when a new point arrives we can quickly find the new G-Skyline based on the existing G-Skyline groups. The process is shown in Algorithm 2 as follows.

Example

An example of Algorithm 2 is shown in Fig. 7 based on Fig. 1. Assume at present the active points in the data stream are these 11 points which compose the data set D. The k-item G-Skyline groups are shown in Fig. 7 where 1 ≤ k ≤ 4. When a new point p(18, 30) arrives, after updating the skyline layers and DSG, we can begin to update the relevant groups. The parent of p is only p11, so each G-Skyline group containing p11 should be expanded. For visualization clarity, we omit the reduplicate new G-Skyline groups coming from the existing G-Skyline groups. The groups in the dotted box are new groups born from the existing G-Skyline groups. At level |S|p = 1, we can easily find that p’s arriving has no effect on the 1-item G-Skyline. At level |S|p = 2, among the existing 2-item G-Skyline groups, we find the group {p11, p8} contains the parent and the child of p, according to solution 2, we should replace p8 by p to form the new G-Skyline group {p11, p}, and this new group is G-Skyline group, but the group {p11, p8} will be no longer the G-Skyline group because it is g-dominated by the new group {p11, p}. Similarly, for the group {p6, p11, p8}, it also contains the parent and the child of p, so we get the new G-Skyline group {p6, p11, p} instead of {p6, p11, p8}. As a result, level |S|p = 4 shows all the 4-item G-Skyline groups without checking.
Fig. 7

Finding G-Skyline when a new point arrives

4.3 Updating G-skyline for point expiring

Each point in the data stream has its life cycle, when an active point expires, the active data set will change too, so we should compute the new G-Skyline groups of the new active dataset based on the existing result. In this section, we firstly update the skyline layers, then reconstruct the DSG, finally compute the G-Skyline for point expiring.

Update the skyline layers

Different form the point arriving, when a point expires, we can easily update the skyline layers. Assume the point p in layerL expires, if p is the tail point of layerL, each tail point of \( layer_{{L{\prime }}} \) (\( L^{{\prime }} \) > L) will be moved to the tail of \( layer_{{L{\prime - 1}}} \), at the same time, any other points still locate in their previous layer. If p is not the tail point of layerL, we will not only delete p, but also change the layers of some points. If the point q is in layerL+1 and its parent in layerL is only p, then we can change q’s layer to layerL−1. Then we will similarly change the layers of some points one layer by one layer. The procedure of updating skyline layers is shown in Algorithm 3.

Example

An example of Algorithm 3 is shown in Fig. 8 based on Fig. 1. For the 11 active points, if the tail point expires, we should firstly delete this tail point, then change the tail point of layeri+1 to become the tail point of layeri, other points keep their layers unchanged. As shown in Fig. 8(a), when the tail point p10 expires, p9 becomes the tail of layer2, and p7 becomes the tail of layer3, where in the previous layers p9 was the tail of layer3 and p7 was the tail of layer4. However, when point p8 expires, we find p8 is p5’s single parent in layer2, so we move p5 to lower layer2, but p4 still lie in its original layer because the parents of p4 in layer3 are p5 and p9 while only p5 is in S, the new skyline layers is shown in Fig. 8(b).
Fig. 8

Updating skyline layers. a Point p10 expires. b Point p8 expires

Update the directed skyline graph (DSG)

When an old point expires, it may affect the existing skyline layers, and it maybe also change the DSG.

The DSG reflect the dominance relationships of each point, so when point p expires and is removed, there will be no relationships between it and its parents, and between it and its children. In the DSG, the directed edge indicates the dominance relationship, so when p expires, the edges between p and its parents and the edges between p and its children should be deleted, at the same time, the new directed edges from p’s parents to p’s children will be added to DSG to update it, the procedure is shown in Fig. 9. In fact, the dominant relationships indicated by new directed edges have been existed already, just because we omit them for visualization clarity. The most importance is p’s expiration will not change the dominance relationships between other points.
Fig. 9

The change of edges when a point expires

Example

We give an example to show the DSG updating when an old point expires in Fig. 10 based on Fig. 1. Assume that these 11 points in Fig. 1 are active points in data stream, when the expiration time of p3 arrives, the point p3 will be deleted. Then the dominance relationships about p3 will be removed from DSG, at the same time, the new directed edges between its parents (such as p6) and its children (such as p2) will be added to DSG.
Fig. 10

Updating DSG

Compute G-skyline for a point expiring

Similar to the G-Skyline computing for point arriving, after updating the skyline layers and the DSG, we can compute the new G-Skyline groups based on the existing G-Skyline. According to the G-dominate concept and Theorem 3, we find that the expiration of point p only has an effect on the G-Skyline groups which contain p or p’s children, it will not affect any other groups. Next, we analyze whether these groups are new G-Skyline groups or not.
  1. 1.

    For the G-Skyline groups

    If G is a G-Skyline group and does not contain p, according to G-Skyline concept, there is not a groups \( G^{{\prime }} \)can G-dominate G, we can easily know that p’s expiration does not affect such groups.□

    If G is a G-Skyline groups and contains p, we can get the new k-item G-Skyline group by deleting p from the old k + 1-point G-Skyline groups.

    Prove From the algorithm for the static data set, we know that the G-Skyline groups with k + 1 points come from G-Skyline with k points. Assume that G = {q1, q2,, qk, p} is G-Skyline groups with k + 1 points, and it maybe contains p’s children, we know that each point in G and all of its parents are in G. When p expires and is deleted from G, \( G^{{\prime }} \) = Gp = {q1, q2,, qk}, and each point in \( G^{{\prime }} \) and all of its parents are still in \( G^{{\prime }} \), according to Theorem 4, we can say \( G^{{\prime }} \) is G-Skyline groups with k points when p expires.□

     
  2. 2.

    For the non-G-Skyline groups

    If G is a non-G-Skyline group and contains p, it can be deleted safely.

    Prove We can easily prove this conclusion because G will not exist when p expires.

    If G is a non-G-Skyline group and does not contain p, it maybe become G-Skyline group when p expires.

    Prove: When point p expires, it will not dominate its children any more, so some of non-G-Skyline groups which does not contain p but contains p’s children maybe become G-Skyline. According to Theorem 4, these groups must satisfy the condition: for such group G, each of point in G and all of its parents except p must be in G. Because p’s children’s parents contain p’s parents, so if we add p to G to form \( G^{{\prime }} \) with k + 1 points, \( G^{{\prime }} \) must be k + 1-point G-Skyline group of the active data set before p expires. Then we can get these candidate groups by deleting p from the k + 1-item G-Skyline groups, and these groups have been returned in (1). However, any other non-G-Skyline groups which do not meet this condition will not belong to G-Skyline, and they can be pruned safely.□

    When a point p expires, we can quickly compute the new G-Skyline groups based on the existing G-Skyline groups. Based on the above strategy, our key idea of the algorithm is shown in Algorithm 4.
     

Example

Now we show an example of Algorithm 4 in Fig. 11 based on the data in Fig. 1. Assume the active points of data stream at present time are these 11 points. When an old point p6 expires, p6 will be useless, and the active points data set will change, so we should compute the new G-Skyline. At level |Sp| = 1, we firstly delete p6, then get the new 1-point G-Skyline group p3 by removing p6 from {p6, p3} which is 2-point old G-Skyline group, and we set a dashed box to denote the new G-Skyline group. At the same level, we can also remove p6 from {p1, p6} and {p6, p11} to get the new skyline point p1 and p11, but these two points already belong to skyline, for visualization clarity, we do not show this kind of indications. Similarly, at the level |Sp| = 2, we can get the new 2-point G-Skyline groups {p1, p3} and {p11, p3} by removing p6 from the old 3-point G-Skyline groups {p1, p6, p3} and {p6, p11, p3}. As a result, level |Sp| = 3 shows all the new G-Skyline groups with 3 points. Based on our pruning strategies, we can easily get the new G-Skyline groups at each level without much computation.
Fig. 11

Finding G-Skyline when point p6 expires

5 Experiments

In this section, we present experimental evaluation about our algorithms.

5.1 Experiment preparation

We simulate a data stream and evaluate the algorithms when the new data arrives or the old data expires. For each condition, we firstly evaluate the skyline layers updating, and then perform the comprehensive experiments to test the G-Skyline algorithm based on the synthetic data. To examine the extendibility of our algorithms, we generate three critical types of data: the correlated data (COR), the independent data (IND) and the anti-correlated data (ANTI-COR). The example of each type of data with 2-dimension is shown in Fig. 12.
Fig. 12

The synthetic dataset. a COR, b ANTI-COR, c IND

For the correlated dataset and the anti-correlated dataset, the points are generated by selecting a plane perpendicular to the line from (0,…,0) to (1,…,1) using a normal distribution, while for the independent dataset, all attribute values of points are generated independently using a uniform distribution. For each type of data, we simulate the data stream in such way, a new point is generated randomly at regular intervals to simulate a new point arriving in the data stream, similarly, a point will be deleted at regular intervals to simulate an old point expiring in the data stream, and the point which generated earlier will be deleted earlier.

We also use the real stock data to evaluate the efficiency of our algorithms.

Because this is the first time to compute G-Skyline over the data stream, our examine evaluation was conducted against the existing algorithm for static dataset. All the experiments are performed on a PC with 1.7 GHz Intel Core i7 processor running Windows 7 operation system with 8 GB memory and 1TB hard drive. The algorithms to be examined in the experiments are as follows.

PAA

Computing G-Skyline groups for a new point arriving.

PEA

Computing G-Skyline groups for an old point expiring.

PWise

Point-Wise algorithm of G-Skyline for static dataset in paper [1].

5.2 Updating skyline layers

Firstly we examine our algorithms for updating skyline layers when the new point arrives or an old point expires. The PWise algorithm is to rebuild all the skyline layers by binary searching for the new active dataset, while our algorithm can update the skyline layers directly based on the existing skyline layers.

Figure 13 shows the running time cost of updating skyline layers in the PWise algorithm and our algorithms on the different datasets. When the group size k varies from 2 to 6, we find that the PWise algorithm is affected by the different datasets and the growth of running time is flat from correlated dataset to independent dataset, and to anti-correlated dataset. The reason is that the PWise algorithm only considers the points in the first k skyline layers while other points will not be considered. Different from PWise, our algorithms perform better. The reason is that when a new point p arrives in the data stream, based on the existing skyline layers, we should only use binary search to find where p will locate. In order to compute the G-Skyline continuously, the skyline layers in our algorithm must contain all of the active points, so no matter what value the group size k is, for the same dataset, the running time of updating skyline layer is same, and it is much less than PWise. However, when an old point p expires, to update the skyline layers, our PEA can directly delete the p and change the layers of some points dominated by p. This work is more easy and the running time is the most least.
Fig. 13

Updating skyline layers. a COR, b IND, c ANTI-COR

According to the distribution of each dataset, we find that the average layer number follows COR.ln > IND.ln > ANTI-COR.ln. Then the running time of our algorithm shows little growth. Finally, our algorithms perform better than PWise.

5.3 Performance with respect to the synthetic data

In this section, we show the experimental evaluation of algorithms on the synthetic dataset. Each dataset is generated following the seminal work in paper [2].

Figure 14 shows the running time of algorithms on each synthetic dataset with different dataset size n, while d = 2, k = 4. When n is more than 103, adding a new point to the active dataset or deleting an old point from the active dataset has no effect on the total number of points to be computed in PWise, so the running time of PWise for this two cases approximately equal, and we can use the same time value in the figure. The varying n has a certain effect on the PWise algorithm because it should compute the points in the first k layer, and the total number increases with n increasing, then the running time shows little growth on the COR dataset and IND dataset, while PWise need much time in ANTI-COR dataset because every layer has more points than other two dataset. However, our algorithm perform better. When a new point arrives in the data stream, based on the existing G-Skyline groups, PAA only need to check the groups expanded from the existing G-Skyline groups which contain all of p’s parents, the number of these candidates will not be large, so the running time of PAA is less than PWise. Similarly, when an old point p expires, PEA only need to check the existing G-Skyline groups which contain p, this is very easy and the running time is very little.
Fig. 14

Finding G-Skyline with different dataset size n. a COR, b IND, c ANTI-COR

Figure 15 shows the running time of algorithms on each synthetic dataset with different dimension size d, while n = 1000, k = 3. The varying d has much effect on the PWise algorithm because the total number of the points in the first k layers increases sharply with d increasing. However, the running time of our algorithms is less and increase smoothly. The reason is that our algorithms can get the new G-Skyline based on the existing G-Skyline, although the number of points in the first k layers increases sharply, the number of candidates to be checked in our algorithm keeps little growth.
Fig. 15

Finding G-Skyline with different dimension size d. a COR, b IND, c ANTI-COR

Figure 16 show the running time of algorithms on synthetic dataset with different group size k, while n = 1000, d = 2. The running time of PWise increases sharply with k increasing, the reason is that the number of points in the first k layers increases quickly. PAA needs a little more time than PEA, this is due to their different solution approach, PAA needs to check more candidates than PEA.
Fig. 16

Finding G-Skyline with different group size k. a COR, b IND, c ANTI-COR

5.4 Performance with respect to the real stock data

In order to evaluate the algorithms’ efficiency on a real data set, we do the experiments on the real stock data from www.finance.yahoo.com. The real data contain 3 *105 records of stock, and each record has 3 attributes: change, volume and price.

Figure 17 shows performance of algorithms on real data with different dataset size n. We find that the dataset size has little impact on the algorithms, and our algorithms are better, the reason is that the dataset size is not very large. Figure 18 shows performance of algorithms on real data with different group size k. The group size has much impact on the algorithms, however, our algorithms are better and efficient. As a result, our algorithms do better in G-skyline query over real data stream, the reason is that our algorithms can compute the new G-skyline based on existing result, and there will be fewer points to be used to form the candidate groups when a new point arriving or an old point expiring.
Fig. 17

Finding G-Skyline with different dataset size n in real data

Fig. 18

Finding G-Skyline with different group size k in real data

6 Conclusions and future work

Processing dynamic data or data stream from the wireless sensor network will provide important information for users. In this paper, we proposed the problem of finding G-Skyline groups over the data stream in the wireless sensor network. In order to compute the G-Skyline groups efficiently, we firstly presented the sharing strategy, and then based on which, we proposed two algorithms PAA and PEA to compute the new G-Skyline groups when a new point arrive or an old point expires. The experiment results based on the synthetic data and real data show our algorithms’ benefit. In the future, we will consider how to compute the G-Skyline groups in wireless network if different sensors sample data at different time.

Notes

Acknowledgements

The authors will thank all the members in database laboratory of Donghua University. This work is supported by Natural Science Foundation of China (No. 61672151), Youth fund of Daqing Normal University (No. 15ZR07).

Compliance with ethical standards

Conflict of interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

References

  1. 1.
    Liu, J., Xiong, L., & Pei, J. (2015). Finding pareto optimal groups: Group-based skyline. In VLDB (pp. 2086–2097).Google Scholar
  2. 2.
    Borzsonyi, S., Kossmann, D., & Stocker, K. (2001). The skyline operator. In ICDE (pp. 421–430).Google Scholar
  3. 3.
    Chomicki, J., Godfrey, P., & Gryz, J., et al. (2003). Skyline with presorting. In ICDE (pp. 717–719).Google Scholar
  4. 4.
    Tan, K., Eng, P., & Ooi, B. (2001). Efficient progressive skyline computation. In VLDB (pp. 301–310).Google Scholar
  5. 5.
    Kossmann, D., Ramsak, F., & Rost, S. (2002). Shooting stars in the sky an online algorithm for skyline queries. In VLDB (pp. 275–286).Google Scholar
  6. 6.
    Pei, J, Jin, W., Ester, M., & Tao, Y. (2005). Catching the best views of skyline: Asemantic approach based on decisive subspaces. In VLDB (pp. 253–264).Google Scholar
  7. 7.
    Lee, J., & Hwang, S. W. (2014). Toward efficient multidimensional subspace skyline computation. VLDB Journal, 23(1), 129–145.CrossRefGoogle Scholar
  8. 8.
    Xia, T., & Zhang, D. (2006). Refreshing the sky: the compressed skycube with efficient support for frequent updates. In SIGMOD (pp. 491–502).Google Scholar
  9. 9.
    Li, Y., Li, Z. Y., & Dong, M. X. (2015). Efficient subspace skyline query based on user preference using MapReduce. Ad Hoc Networks, 35, 105–115.CrossRefGoogle Scholar
  10. 10.
    Chan, C. Y., Jagadish, H. V., & Tan, K. L. et al. (2006). Finding k-dominant skylines in high dimensional space. In SIGMOD (pp. 503–514).Google Scholar
  11. 11.
    Miao, X. Y., Gao, Y., et al. (2016). k-dominant skyline queries on incomplete data. Information Sciences, 367, 990–1011.CrossRefGoogle Scholar
  12. 12.
    Lee, J., You, G., & Hwang, S. (2008). Personalized top-k skyline queries in high-dimensional space. Information Systems, 1, 45.Google Scholar
  13. 13.
    Jiang, T., Zhang, B., & Lin, D. (2015). Incremental evaluation of top-k combinatorial metric skyline query. Knowledge-Based Systems, 74, 89–105.CrossRefGoogle Scholar
  14. 14.
    Zhang, W., Lin, X., Zhang, Y., et al. (2010). Threshold based probabilistic top k dominating query[J]. The VLDB Journal, 19(2), 283–305.CrossRefGoogle Scholar
  15. 15.
    Jiang, T., Zhang, B., Gao, Y., et al. (2013). Efficient top k query processing on mutual skyline. Journal of Computer Research and Development, 50(5), 986–997. (In Chinese).Google Scholar
  16. 16.
    Le, T. M. N., Cao, J., & He, Z. (2016). Answering skyline queries on probabilistic data using the dominance of probabilistic tuples. Information Sciences, 340–341, 58–85.MathSciNetCrossRefGoogle Scholar
  17. 17.
    Pei, J., Jiang, B., Lin, X., & Yuan, Y. (2007). Probabilistic skylines on uncertain data. In VLDB (pp. 15–26).Google Scholar
  18. 18.
    Pujari, A. K., Kagita, V. R., & Garg, A. (2015). Efficient computation for probabilistic skyline over uncertain preferences. Information Sciences, 324, 146–162.MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Lin, X., Yuan, Y., Wang, W., & Lu, H. (2005). Stabbing the sky: Efficient skyline computation over sliding windows. In ICDE (pp. 502–513).Google Scholar
  20. 20.
    Morse, M., Patel, J.-M., & Grosky, W.-I. (2007). Efficient continuous skyline computation. Information Sciences, 177(17), 3411–3437.MathSciNetCrossRefGoogle Scholar
  21. 21.
    Li, H., Yoo, J. (2014). An efficient scheme for continuous skyline query processing over dynamic data set. In Proceedings of the international conference on big data and smart computing (pp. 54–59). Bangkok, Thailand.Google Scholar
  22. 22.
    Tiziano, D. M., Salvatore, D. G., & Gabriele, M. (2015). A multicore parallelization of continuous skyline queries on data streams. LNCS, 9233, 402–413.Google Scholar
  23. 23.
    Su, I.-F., Chung, Y.-C., & Lee, C. (2010). Top-k combinatorial skyline queries. In Proceedings of the 15th international conference on database systems for advanced applications (DASFAA 2010). Google Scholar
  24. 24.
    Im, H., & Park, S. (2011). Group skyline computation. Information Sciences, 188, 151–169.MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Li, C., Rajasekaran, S., Zhang, N., et al. (2014). On skyline groups. TKDE, 4(26), 942–956.Google Scholar
  26. 26.
    Chung, Y.-C., Su, I.-F., & Lee, C. (2013). Efficient computation of combinatorial skyline queries. Information System, 38, 369–387.CrossRefGoogle Scholar
  27. 27.
    Magnani, M., & Assent, I. (2013). From stars to galaxies: Skyline queries on aggregate data. In EDBT (pp. 477–488).Google Scholar
  28. 28.
    Guo, Xi, Li, Hailing, Wulamu, Aziguli, et al. (2016). Efficient processing of skyline group queries over a data stream. Tsinghua Science and Technology, 21(1), 29–39.CrossRefzbMATHGoogle Scholar
  29. 29.
    Babcock, B., Babu, S., & Datar, M., et al. (2002). Models and issues in data stream systems. In Proceedings of the ACM SIGACT-SIGMOD symposium on principles of database systems. Wisconsin.Google Scholar
  30. 30.
    Son, W., Stehn, F., & Knauer, C. (2017). Top-k manhattan spatial skyline queries. Information Processing Letters, 123(1), 27–35.MathSciNetCrossRefzbMATHGoogle Scholar
  31. 31.
    Lee, K. H., Kim, J., & Kim, M. H. (2017). Simultaneous processing of multi-skyline queries with mapreduce. In IEICE transactions on information and systems (Vol. E100-D, No. 7).Google Scholar
  32. 32.
    Zaman, A., & Siddique, M. A. (2017). Annisa. Finding key persons on social media by using MapReduce skyline. International Journal of Networking and Computing, 7(1), 86–104.CrossRefGoogle Scholar
  33. 33.
    Zhu, H., Zhu, P., & Li, X. (2017). Computing skyline groups: an experimental evaluation. In ACM turing celebration conference-China (p. 48).Google Scholar
  34. 34.
    Zhu, H., Zhu, P., & Li, X. (2017). Parallelization of group-based skyline computation for multi-core processors. Concurrency and Computation Practice and Experience, 3, e4195.CrossRefGoogle Scholar
  35. 35.
    Yin, B., Zhou, S., & Zhang, S. (2017). On efficient processing of continuous reverse skyline queries in wireless sensor networks. KSII Transactions on Internet and Information Systems, 11(4), 1931–1953.Google Scholar
  36. 36.
    Wang, Y., Song, B., & Wang, J. (2016). Geometry-based distributed spatial skyline queries in wireless sensor networks. Sensors, 16(4), 454.CrossRefGoogle Scholar
  37. 37.
    Ahmed, K., Nafi, N., & Gregory, M. (2016). Enhanced distributed dynamic skyline query for wireless sensor networks. Journal of Sensor and Actuator Networks, 5(1), 2.CrossRefGoogle Scholar
  38. 38.
    Yin, B., Zhou, S., & Zhang, S. (2017). On efficient processing of continuous reverse skyline queries in wireless sensor networks. KSII Transactions on Internet and Information Systems, 11(4), 1931–1953.Google Scholar
  39. 39.
    Wang, Y., Wei, W., & Deng, Q. (2016). An energy-efficient skyline query for massively multidimensional sensing data. Sensors, 16(1), 83.CrossRefGoogle Scholar
  40. 40.
    Example, F., & Quality, C. (2015). A environment. Alternative tuples based probabilistic skyline query processing in wireless sensor networks. Mathematical Problems in Engineering, 2015, 1–10.Google Scholar
  41. 41.
    Wulamu, A., & Li, H., et al. (2016). Processing skyline groups on data streams. ubiquitous intelligence and computing and 2015. In 2015 IEEE 12th international conference on autonomic and trusted computing and 2015, IEEE 15th international conference on scalable computing and communications and its associated workshops (UIC-ATC-ScalCom) (pp. 935–942).Google Scholar
  42. 42.
    Nagendra, Mithila. (2014). Efficient processing of skyline queries on static data sources, data streams and incomplete datasets. Arizona: Arizona State University.Google Scholar

Copyright information

© The Author(s) 2018

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors and Affiliations

  1. 1.School of Information Science and TechnologyDonghua UniversityShanghaiChina
  2. 2.School of Computer Science and Information TechnologyDaqing Normal UniversityDaqingChina
  3. 3.School of Computer Science and TechnologyDonghua UniversityShanghaiChina
  4. 4.School of Electronic Information and Electrical EngineeringShanghai Jiao Tong UniversityShanghaiChina

Personalised recommendations