1 Introduction

Improving the efficiency and security of power delivery is a critically important goal, in the face of disturbances arising from severe weather, human error, equipment failure, or even intentional intrusion. Estimates [5] suggest that reducing outages in the U.S. grid could save \(\$ 49\) billion per year, reduce emissions by 12 to \(18\%\), while improving efficiency could save an additional \({\$ 20.4}\) billion per year. A key part of achieving this goal is to use sensor monitoring data to quickly identify when parts of the grid fail, so as to quickly respond to the problem.

A major challenge is scalability - power systems data can be both high-volume and received in real time, since the data comes from sensors which are continuously monitoring the grid. This motivates us to develop fast methods that work in this online (or streaming) setting. When each new data point is received, the algorithm should update itself efficiently.

Hence, our goal is an online anomaly detection algorithm:

Informal Problem 1

(Online Anomaly Detection)

  • Given: A graph \(\mathcal {G}= (\mathcal {V}, \mathcal {E})\), and a subset \(\mathcal {S}\) of nodes which contain sensors. For each sensor, we have a continuous stream of values of real and imaginary voltage V(t) and current I(t) measured by these sensors.

  • Find: At each time t, compute an anomalousness score A(t), indicating our confidence level that an anomaly occurred (i.e. a transmission line failed).

For cost reasons, it is generally infeasible to place sensors at every node. Hence, an important follow-up question is where to place sensors so as to maximize the probability of detecting an anomaly.

Informal Problem 2

(Sensor Placement)

  • Given: A budget k of the number of sensors we can afford, a graph \(\mathcal {G}= (\mathcal {V}, \mathcal {E})\), and a simulator that allows us to simulate sensor readings at each node.

  • Find: A set of nodes \(\mathcal {S}\subseteq \mathcal {V}\), which are the locations we should place our sensors, such that \(|\mathcal {S}|=k\).

In contrast to most approaches, our anomaly detection algorithm, GridWatch-D, uses a domain-dependent approach which exploits the fact that electrical sensors consist of a voltage reading at a node as well as the current along each adjacent edge. This allows us to detect anomalies more accurately, even when using an online approach. Next, we propose GridWatch-S, a sensor placement algorithm. The main idea is to define an objective which estimates our probability of successfully detecting an anomaly, then show that this objective has the submodularity property, allowing us to optimize it with approximation guarantees using an efficient greedy algorithm.

Figure 1a shows the sensors selected by GridWatch-S: red circles indicate positions chosen. Figure 1b shows the anomaly scores (black line) output by GridWatch-D, which accurately match the ground truth. Figure 1c shows that GridWatch-S outperforms baselines on the case2869 data.

Our contributions are as follows:

  1. 1.

    Online anomaly detection: we propose a novel, online anomaly detection algorithm, GridWatch-D, that outperforms existing approaches.

  2. 2.

    Sensor placement: we construct an optimization objective for sensor placement, with the goal of maximizing the probability of detecting an anomaly. We show that this objective has the property of ‘submodularity,’ which we exploit to propose our sensor placement algorithm.

  3. 3.

    Effectiveness: Our sensor placement algorithm, GridWatch-S, is provably near-optimal. In addition, both our algorithms outperform existing approaches in accuracy by \(59\%\) or more (F-measure) in experiments.

  4. 4.

    Scalability: Our algorithms scale linearly, and GridWatch-D is online, requiring bounded space and constant time per update.

Reproducibility: Our code and data are publicly available at github.com/bhooi/gridwatch.

Fig. 1.
figure 1

(a) GridWatch-S provably selects near-optimal sensor locations. Red circles indicate positions chosen for sensors, in the case2869 graph. (b) GridWatch-D computes anomaly scores (black line) on case2869. Red crosses indicate ground truth - notice 100% true alarms (all black spikes above blue line are true alarms) and only 4 false dismissals (red crosses below blue line). (c) F-measure of GridWatch-S compared to baselines on case2869. (Color figure online)

2 Background and Related Work

Time Series Anomaly Detection. Numerous algorithms exist for anomaly detection in univariate time series [17]. For multivariate time series, LOF [8] uses a local density approach. Isolation Forests [20] partition the data using a set of trees for anomaly detection. Other approaches use neural networks [34], distance-based [28], and exemplars [15]. However, none of these consider sensor selection.

Anomaly Detection in Temporal Graphs. [4] finds anomalous changes in graphs using an egonet (i.e. neighborhood) based approach, while [10, 22] uses community-based approaches. [3] finds change points in dynamic graphs, while other partition-based [2] and sketch-based [29] exist for anomaly detection. However, these methods require fully observed edge weights (i.e. all sensors present), and do not consider sensor selection.

Power Grid Monitoring. A number of works consider the Optimal PMU Placement (OPP) problem [9], of optimally placing sensors in power grids, typically to make as many nodes as possible fully observable, or minimizing mean-squared error. Greedy [19], convex relaxation [16], integer program [12], simulated annealing [7] have been proposed. However, these do not perform anomaly detection. [21, 27, 36] consider OPP in the presence of branch outages, but not anomalies in general, and due to their use of integer programming, only use small graphs of size at most 60.

Epidemic and Outbreak Detection. [18] proposed CELF, for outbreak detection in networks, such as water distribution networks and blog data, also using a submodular objective function. Their setting is a series of cascades spreading over the graph, while our input data is time-series data from sensors at various edges of the graph. For epidemics, [11, 25] consider targeted immunization, such as identifying high-degree [25] or well-connected [11] nodes. We show experimentally that our sensor selection algorithm outperforms both approaches.

Table 1. Comparison of related approaches: only GridWatch satisfies all the listed properties.

Table 1 summarizes related work. GridWatch differs from existing methods in that it performs anomaly detection using an online algorithm, and it selects sensor locations with a provable approximation guarantee.

2.1 Background: Submodular Functions

A function f defined on subsets of \(\mathcal {V}\) is submodular if whenever \(\mathcal {T}\subseteq S\) and \(i \notin S\):

$$\begin{aligned} f(\mathcal {S}\cup \{i\}) - f(\mathcal {S}) \le f(\mathcal {T}\cup \{i\}) - f(\mathcal {T}) \end{aligned}$$
(1)

Intuitively, this can be interpreted as diminishing returns: the left side is the gain in f from adding i to \(\mathcal {S}\), and the right side is the gain from adding i to \(\mathcal {T}\). Since \(\mathcal {T}\subseteq S\), this says that as \(\mathcal {T}\) ‘grows’ to \(\mathcal {S}\), the gains from adding i can only diminish.

[23] showed that nondecreasing submodular functions can be optimized by a greedy algorithm with a constant-factor approximation guarantee of \((1-1/e)\). These were extended by [32] to the non-constant sensor cost setting.

3 GridWatch-D Anomaly Detection Algorithm

Preliminaries. Table 2 shows the symbols used in this paper.

Table 2. Symbols and definitions

In this section, we are given a graph \(\mathcal {G}=(\mathcal {V},\mathcal {E})\) and a fixed set of sensors \(\mathcal {S}\subseteq \mathcal {V}\). Each sensor consists of a central node i on which voltage \(V_i(t) \in \mathbb {C}\) is measured, at each time t. Note that complex voltages and currents are used to take phase into account, following standard practice in circuit analysis (this paper will not presume familiarity with this). Additionally, for sensor i, letting \(\mathcal {N}_i\) be the set of edges adjacent to i, we are given the current \(I_e \in \mathbb {C}\) along each edge \(e \in \mathcal {N}_i\).

For sensor i and edge \(e \in \mathcal {N}_i\), define the power w.r.t. i along edge e as \(S_{ie}(t) = V_i(t) \cdot I_e(t)^*\), where \(^*\) is the complex conjugate. We find that using power (rather than current) provides better anomaly detection in practice. However, when considering the edges around a single sensor i, variations in current result in similar variations in power, so they perform the same role.

3.1 Types of Anomalies

Our goal is to detect single edge deletions, i.e. a transmission line failure. Single edge deletions affect the voltage and current in the graph in a complex, nonlinear way, and can manifest themselves in multiple ways. Consider the illustrative power grid shown by the graphs in Fig. 2. The power grid consists of a single generator, a single load, and power lines of uniform resistance. When the edge marked in the black cross fails, current is diverted from some edges to others, causing some edges to have increased current flow (blue edges), and thus increased power, and others to have decreased current flow (red edges). Current flows are computed using a standard power grid simulator, Matpower [37].

Fig. 2.
figure 2

Domain-aware model for anomalies: edge failures form 3 patterns. Edge color indicates change in current due to the edge failure: blue is an increase; red is a decrease. Left: diversion into a single edge. Center: diversion from the right to the left side of the graph, forming a group anomaly. Right: diversion between paths, forming a group diversion anomaly. (Color figure online)

In the leftmost plot, the edge deletion diverts a large amount of current into a single edge, resulting in a highly anomalous value \((+0.4)\) along a single edge. To detect single-edge anomalies, we consider the largest absolute change in power in the edges adjacent to this sensor. Formally, letting \(\varDelta S_{ie}(t) = S_{ie}(t) - S_{ie}(t-1)\).

Definition 1

(Single-Edge Detector). The detector at sensor i is:

$$\begin{aligned} x_{SE,i}(t) = \max _{e \in \mathcal {N}_i} |\varDelta S_{ie}(t)| \end{aligned}$$
(2)

In the middle plot, the edge deletion cuts off a large amount of current that would have gone from the generator toward the right side of the graph, diverting it into the left side of the graph. This results in some nodes in the left region with all their neighboring edges having positive changes (blue), such as the leftmost node. Individually, these changes may be too small to appear anomalous, but in aggregate, they provide stronger evidence of an anomaly. Hence, the group anomaly detector computes the sum of power changes around sensor i, then takes the absolute value:

Definition 2

(Group Anomaly Detector). The detector at sensor i is:

$$\begin{aligned} x_{GA,i}(t) = |\sum _{e \in \mathcal {N}_i}(\varDelta S_{ie}(t))| \end{aligned}$$
(3)

In the right plot, the edge deletion diverts current between nearby edges. In particular, current diversions around the central node cause it to have neighbors which greatly differ from each other: 2 positive edges and 2 negative edges. If this diversion is large enough, this provides stronger evidence of an anomaly than simply looking at each edge individually. Hence, the group diversion detector measures the ‘spread’ around sensor i by looking at the total absolute deviation of power changes about sensor i:

Definition 3

(Group Diversion Detector). The detector at sensor i is:

$$\begin{aligned} x_{GD,i}(t) = \sum _{e \in \mathcal {N}_i}|\varDelta S_{ie}(t) - \underset{e \in \mathcal {N}_i}{\text {mean}}(\varDelta S_{ie}(t))| \end{aligned}$$
(4)

3.2 Proposed Anomaly Score

Having computed our detectors, we now define our anomaly score. For each sensor i, concatenate its detectors into a vector:

$$\begin{aligned} X_i(t) = [x_{SE,i}(t) \ \ x_{GA,i}(t) \ \ x_{GD,i}(t)] \end{aligned}$$
(5)

Sensor i should label time t as an anomaly if any of the detectors greatly deviate from their historical values. Hence, let \(\tilde{\mu }_i(t)\) and \(\tilde{\sigma }_i(t)\) be the historical median and inter-quartile range (IQR)Footnote 1 [35] of \(X_i(t)\) respectively: i.e. the median and IQR of \(X_i(1), \cdots , X_i(t-1)\). We use median and IQR generally instead of mean and standard deviation as they are robust against anomalies, since our goal is to detect anomalies.

Thus, define the sensor-level anomalousness as the maximum number of IQRs that any detector is away from its historical median. The infinity-norm \(\Vert \cdot \Vert _\infty \) denotes the maximum absolute value of a vector.

Definition 4

(Sensor-level anomalousness). Sensor-level anomalousness is:

$$\begin{aligned} a_{i}(t) = \left\| \frac{X_i(t) - \tilde{\mu }_i(t)}{ \tilde{\sigma }_i(t)} \right\| _\infty \end{aligned}$$
(6)

Finally, the overall anomalousness at time t is the maximum of \(a_i(t)\) over all sensors. Taking maximums allows us to determine the location (not just time) of an anomaly, by looking at which sensor contributed toward the maximum.

Definition 5

(Overall anomalousness). Overall anomalousness at time t is:

$$\begin{aligned} A(t) = \max _{i\in \mathcal {S}} a_{i}(t) \end{aligned}$$
(7)

Algorithm 1 summarizes our GridWatch-D anomaly detection algorithm. Note that we can maintain the median and IQR of a set of numbers in a streaming manner using reservoir sampling [33]. Hence, the Normalize operation in Line 5 takes a value of \(\varDelta S_{ie}(t)\), subtracts its historical median and divides by the historical IQR for that sensor. This ensures that sensors with large averages or spread do not dominate.

figure a

Lemma 1

GridWatch-D is online, and requires bounded memory and time.

Proof

We verify from Algorithm 1 that GridWatch-D’s memory consumption is \(O(|\mathcal {S}|)\), and updates in \(O(|\mathcal {S}|)\) time per iteration, which are bounded (regardless of the length of the stream).    \(\blacksquare \)

4 Sensor Placement: GridWatch-S

So far, we have detected anomalies using a fixed set of sensors. We now consider how to select locations for sensors to place given a fixed budget of k sensors to place. Our main idea will be to construct an optimization objective for the anomaly detection performance of a subset \(\mathcal {S}\) of sensor locations, and show that this objective has the ‘submodularity’ property, showing that a greedy approach gives approximation guarantees.

Note the change in problem setting: we are no longer monitoring for anomalies online in time series data, since we are now assuming that the sensors have not even been installed yet. Instead, we are an offline planner deciding where to place the sensors. To do this, we use a model of the system in the form of its graph \(\mathcal {G}\), plugging it into a simulator such as Matpower [37] to generate a dataset of ground truth anomalies and normal scenarios, where the former contain a randomly chosen edge deletion, and the latter do not.

4.1 Proposed Optimization Objective

Intuitively, we should select sensors \(\mathcal {S}\) to maximize the probability of detecting an anomaly. This probability can be estimated as the fraction of ground truth anomalies that we successfully detect. Hence, our optimization objective, \(f(\mathcal {S})\), will be the fraction of anomalies that we successfully detect when using GridWatch-D, with sensor set \(\mathcal {S}\). We will now formalize this and show that it is submodular.

Specifically, define \(X_i(r)\) as the value of sensor i on the rth anomaly, analogous to (5). Also define \(\tilde{\mu }_i\) and \(\tilde{\sigma }_i\) as the median and IQR of sensor i on the full set of normal scenarios. Also let \(a_i(r)\) be the sensor-level anomalousness of the rth anomaly, which can be computed as in Definition 4 plugging in \(\tilde{\mu }_i\) and \(\tilde{\sigma }_i\):

$$\begin{aligned} a_i(r) = \left\| \frac{X_i(r) - \tilde{\mu }_i}{\tilde{\sigma }_i}\right\| _\infty \end{aligned}$$
(8)

Define overall anomalousness w.r.t. \(\mathcal {S}\), \(A(r, \mathcal {S})\), analogously to Definition 5:

$$\begin{aligned} A(r, \mathcal {S}) = \max _{i \in \mathcal {S}} a_i(r) \end{aligned}$$
(9)

Given threshold c, anomaly r will be detected by sensor set \(\mathcal {S}\) if and only if \(A(r,\mathcal {S}) > c\). Hence, our optimization objective is to maximize the fraction of detected anomalies:

$$\begin{aligned} \underset{\mathcal {S}\subseteq \mathcal {V}, |S| = k}{\text {maximize}} \ \ f(\mathcal {S}), \text { where } f(\mathcal {S}) = \frac{1}{s}\sum _{r=1}^s \mathbf {1}\{ A(r, \mathcal {S}) > c \} \end{aligned}$$
(10)

4.2 Properties of Objective

Our optimization objective \(f(\mathcal {S})\) is submodular: informally, it exhibits diminishing returns. The more sensors we add, the smaller the marginal gain in detection probability.

Theorem 1

Detection probability \(f(\mathcal {S})\) is submodular, i.e. for all subsets \(\mathcal {T}\subseteq \mathcal {S}\) and nodes \(i \in \mathcal {V}\setminus \mathcal {S}\):

$$\begin{aligned} f(\mathcal {S}\cup \{i\}) - f(\mathcal {S}) \le f(\mathcal {T}\cup \{i\}) - f(\mathcal {T}) \end{aligned}$$
(11)

Proof

$$\begin{aligned} f(\mathcal {S}\cup \{i\}) - f(\mathcal {S})&= \frac{1}{s} \sum _{r=1}^s \left( \mathbf {1}\{A(r, \mathcal {S}\cup \{i\})>c\} - \mathbf {1}\{A(r, \mathcal {S})>c\} \right) \\&= \frac{1}{s} \sum _{r=1}^s \left( \mathbf {1}\{ \max _{j \in \mathcal {S}\cup \{i\}} a_j(r)> c \} - \mathbf {1}\{ \max _{j \in \mathcal {S}} a_j(r)> c\} \right) \\&= \frac{1}{s} \sum _{r=1}^s \left( \mathbf {1}\{ a_i(r)> c \wedge \max _{j \in \mathcal {S}} a_j(r) \le c \} \right) \\&\le \frac{1}{s} \sum _{r=1}^s \left( \mathbf {1}\{ a_i(r) > c \wedge \max _{j \in \mathcal {T}} a_j(r) \le c \} \right) \\&= f(\mathcal {T}\cup \{i\}) - f(\mathcal {T}) \end{aligned}$$

   \(\blacksquare \)

Theorem 2

\(f(\mathcal {S})\) is nondecreasing, i.e. \(f(\mathcal {T}) \le f(\mathcal {S})\) for all subsets \(\mathcal {T}\subseteq \mathcal {S}\).

Proof

$$\begin{aligned} f(\mathcal {S}) = \frac{1}{s} \sum _{r=1}^s A(r, \mathcal {S}) = \frac{1}{s} \sum _{r=1}^s \max _{j \in \mathcal {S}} a_j(r) \ge \frac{1}{s} \sum _{r=1}^s \max _{j \in \mathcal {T}} a_j(r) = f(\mathcal {T}) \end{aligned}$$

   \(\blacksquare \)

4.3 Proposed GridWatch-S Algorithm

We exploit this submodularity using an efficient greedy algorithm that starts from \(\mathcal {S}\) as the empty set, and iteratively adds the best sensor to maximize \(f(\mathcal {S})\), until the budget constraint \(|S|=k\) is reached. Algorithm 2 describes our GridWatch-S algorithm.

figure b

4.4 Approximation Bound

The nondecreasing and submodularity properties of f imply that Algorithm 2 achieves at least \(1-1/e\) \(({\approx } 63\%)\) of the value of the optimal sensor placement. Letting \(\hat{\mathcal {S}}\) be the set returned by Algorithm 2, and \(\mathcal {S}^*\) be the optimal set:

Theorem 3

$$\begin{aligned} f(\hat{\mathcal {S}}) \ge (1-1/e) f(S^*) \end{aligned}$$
(12)

Proof

This follows from [23] since f is nondecreasing and submodular.    \(\blacksquare \)

5 Experiments

We design experiments to answer the following questions:

  • Q1. Anomaly Detection Accuracy: on a fixed set of sensors, how accurate are the anomalies detected by GridWatch-S compared to baselines?

  • Q2. Sensor Selection: how much does sensor selection using GridWatch-S improve the anomaly detection performance compared to baselines?

  • Q3. Scalability: how do our algorithms scale with the graph size?

Our code and data are publicly available at github.com/bhooi/gridwatch. Experiments were done on a 2.4 GHz Intel Core i5 Macbook Pro, 16 GB RAM running OS X 10.11.2.

Data. We use 2 graphs, case2869 and case9241, which accurately represent different parts of the European high voltage network [37]. case2869 contains 2869 nodes (generators or buses) and 2896 edges (power lines or transformers). case9241 contains 9241 nodes and 16049 edges.

5.1 Q1. Anomaly Detection Accuracy

In this section, we compare GridWatch-D against baseline anomaly detection approaches, given a fixed set of sensors.

Experimental Settings. For each graph, the sensor set for all algorithms is chosen as a uniformly random set of nodes of various sizes (the sizes are plotted in the x-axis of Fig. 3). Then, out of 480 time ticks, we first sample 50 random time ticks as the times when anomalies occur. In each such time tick, we deactivate a randomly chosen edge (i.e. no current can flow over that edge).

Using MatPower [37], we then generate voltage and current readings at each sensor. This requires an input time series of loads (i.e. real and reactive power at each node): we use load patterns estimated from real data [31] recorded from the Carnegie Mellon University (CMU) campus for 20 days from July 29 to August 17, 2016, scaled to a standard deviation of \(0.3\cdot \sigma \), with added Gaussian noise of \(0.2\cdot \sigma \), where \(\sigma \) is the standard deviation of the original time series [31].

This results in a time series of 480 time ticks (hourly data from 20 days), at each time recording the voltage at each sensor and the current at each edge adjacent to one of the sensors. Given this input, each algorithm then returns a ranking of the anomalies. We evaluate this using standard metrics, AUC (area under the ROC curve) and F-measure (\(\frac{2\cdot \text {precision} \cdot \text {recall}}{\text {precision} + \text {recall}}\)), the latter computed on the top 50 anomalies output by each algorithm.

Baselines. Dynamic graph anomaly detection approaches [4, 6, 10, 22, 30] cannot be used as they require graphs with fully observed edge weights. Moreover, detecting failed power lines with all sensors present can be done by simply checking if any edge has current equal to 0, which is trivial. Hence, instead, we compare GridWatch-D to the following multidimensional anomaly detection methods: Isolation Forests [20], Vector Autoregression (VAR) [14], Local Outlier Factor (LOF) [8], and Parzen Window [24]. Each uses the currents and voltages at the given sensors as features. For VAR the norms of the residuals are used as anomaly scores; the remaining methods return anomaly scores directly. For Isolation Forests, we use 100 trees (following the scikit-learn defaults [26]). For VAR we select the order by maximizing AIC, following standard practice. For LOF we use 20 neighbors, and 20 neighbors for Parzen Window.

Fig. 3.
figure 3

Accurate anomaly detection: GridWatch-D outperforms the baselines. Left plots are for case2869; right plots are for case9241.

Figure 3 shows that GridWatch-D outperforms the baselines, by \(31\%\) to \(42\%\) Area under the Curve (AUC) and \(133\%\) to \(383\%\) F-Measure. The gains in performance likely come from the use of the 3 domain-knowledge based detectors, which combine information from the currents surrounding each sensor in a way that makes it clearer when an anomaly occurs.

Further testing shows that GridWatch-D’s 3 detectors all play a role: e.g. on case2869, for 50 sensors, GridWatch-D has F-measure 0.67, but only using single detectors 1, 2 or 3 (where detector 1 refers to the detector in Definition 1, and so on) gives F-measures of 0.51, 0.6 or 0.56 respectively.

5.2 Q2. Sensor Selection Quality

We now evaluate GridWatch-S. We use the same settings as in the previous sub-section, except that the sensors are now chosen using either GridWatch-S, or one of the following baselines. We then compute the anomaly detection performance of GridWatch-D as before on each choice of sensors. For GridWatch-S we use \(c=15\). For our simulated data sizes, we assume 2000 anomalies and 480 normal scenarios.

Baselines: randomly selected nodes (Random); highest degree nodes (Degree); nodes with highest total current in their adjacent edges (MaxCurrent); highest betweenness centrality [13] nodes, i.e. nodes with the most shortest paths passing through them, thus being the most ‘central’ (Betweenness); a power-grid based Optimal PMU Placement algorithm using depth-first search (OPP [7]).

Figure 4 shows that GridWatch-S outperforms the baselines, by 18 to \(19\%\) Area under the Curve (AUC) and 59 to \(62\%\) F-Measure.

Fig. 4.
figure 4

GridWatch-S provides effective sensor selection: sensor selection using GridWatch-S results in higher detection accuracy than baselines.

Figure 1b shows the GridWatch-S scores on case2869 over time, when using the maximum 200 sensors, with red crosses where true anomalies exist. Spikes in anomaly score match very closely with the true anomalies.

5.3 Q3. Scalability

Finally, we evaluate the scalability of GridWatch-D and GridWatch-S. To generate graphs of different sizes, we start with the IEEE 118-bus network [1], which represents a portion of the US power grid in 1962, and duplicate it \(2, 4, \cdots , 20\) times. To keep our power grid connected, after each duplication, we add edges from each node to its counterpart in the last duplication; the parameters of each such edge are randomly sampled from those of the actual edges. We then run GridWatch-D and GridWatch-S using the same settings as the previous sub-section. Figure 5b shows that GridWatch-D and GridWatch-S scale linearly. The blue line is the best-fit regression line.

Fig. 5.
figure 5

Our algorithms scale linearly: wall-clock time of (a) GridWatch-D and (b) GridWatch-S against number of edges in \(\mathcal {G}\). (Color figure online)

6 Conclusion

In this paper, we proposed GridWatch-D, an online algorithm that accurately detects anomalies in power grid data. The main idea of GridWatch-D is to design domain-aware detectors that combine information at each sensor appropriately. We then proposed GridWatch-S, a sensor placement algorithm, which uses a submodular optimization objective. While our method could be technically applied to any type of graph-based sensor data (not just power grids), the choice of our detectors is motivated by our power grid setting. Hence, future work could study how sensitive various detectors are for detecting anomalies in graph-based sensor data from different domains.

Our contributions are as follows:

  1. 1.

    Online anomaly detection: we propose a novel, online anomaly detection algorithm, GridWatch-D that outperforms existing approaches.

  2. 2.

    Sensor placement: we construct an optimization objective for sensor placement, with the goal of maximizing the probability of detecting an anomaly. We show that this objective is submodular, which we exploit in our sensor placement algorithm.

  3. 3.

    Effectiveness: Due to submodularity, GridWatch-S, our sensor placement algorithm is provably near-optimal. In addition, both our algorithms outperform existing approaches in accuracy by \(59\%\) or more (F-measure) in experiments.

  4. 4.

    Scalability: Our algorithms scale linearly, and GridWatch-D is online, requiring bounded space and constant time per update.

Reproducibility: Our code and data are publicly available at github.com/bhooi/gridwatch.