Skip to main content
Log in

A spectral approach to detecting subtle anomalies in graphs

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

In this paper we investigate the problem of detecting small and subtle subgraphs embedded into a large graph with a common structure. We call the embedded small and subtle subgraphs as signals or anomalies and the large graph as background. Those small and subtle signals, which are structurally dissimilar to the background, are often hidden within graph communities and cannot be revealed in the graph’s global structure. We explore the minor eigenvectors of the graph’s adjacency matrix to detect subtle anomalies from a background graph. We demonstrate that when there are such subtle anomalies, there exist some minor eigenvectors with extreme values on some entries corresponding to the anomalies. Under the assumption of the Erdos–Renyi random graph model, we derive the formula that show how eigenvector entries are changed and give detectability conditions. We then extend our theoretical study to the general case where multiple anomalies are embedded in a background graph with community structure. We develop an algorithm that uses the eigenvector kurtosis to filter out the eigenvectors that capture the signals. Our theoretical analysis and empirical evaluations on both synthetic data and real social networks show effectiveness of our approach to detecting subtle signals.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. http://snap.stanford.edu

  2. The 2-norm is the induced matrix norm where ||A||2 is the largest singular value of A. This norm is sub-multiplicative norm (Stewart and Sun 1990).

References

  • Alon, N., Krivelevich, M., Sudakov, B. (1998). Finding a large hidden clique in a random graph. Random Structures & Algorithms, 13(3–4), 457–466.

    Article  MathSciNet  MATH  Google Scholar 

  • Andersen, R., Chung, F., Lang, K. (2006). Local graph partitioning using pagerank vectors. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06) (pp. 475–486). IEEE.

  • Chan, P.K., Schlag, M.D.F., Zien, J.Y. (1993). Spectral k-way ratio-cut partitioning and clustering. In Proceedings of the 30th International Design Automation Conference (pp. 749–754). ACM, New York.

  • Eberle, W., & Holder, L.B. (2007). Anomaly detection in data represented as graphs. Journal of Intelligent Data Analysis, 11, 663–689. IOS Press Amsterdam, The Netherlands.

    Google Scholar 

  • Erdős, P., & Rényi, A. (1959). On random graphs. Publicationes Mathematicae Debrecen, 6, 290–297.

    MathSciNet  Google Scholar 

  • Füredi, Z., & Komlós, J. (1981). The eigenvalues of random symmetric matrices. Combinatorica, 1, 233–241. doi:10.1007/BF02579329.

    Article  MathSciNet  MATH  Google Scholar 

  • Golub, G.H., & Van Loan, C.F. (1996). Matrix Computations. The Johns Hopkins University Press.

  • Hagen, L.W., & Kahng, A.B. (1992). New spectral methods for ratio cut partitioning and clustering. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 11(9), 1074–1085.

    Article  Google Scholar 

  • Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W. (2008). Statistical properties of community structure in large social and information networks. In WWW (pp. 695–704).

  • Miller, B., Bliss, N., Wolfe, P. (2010). Subgraph detection using eigenvector l1 norms. In Advances in Neural Information Processing Systems (vol. 23).

  • Mitra, P. (2009). Entrywise bounds for eigenvectors of random graphs. The Electronic Journal of Combinatorics.

  • Newman, M.E.J. (2006). Finding community structure in networks using the eigenvectors of matrices. Physical Review, 74(3), 036104–+.

    Google Scholar 

  • Ng, A.Y., Jordan, M.I., Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems (pp. 849–856).

  • Noble, C.C., & Cook, D.J. (2003). Graph-based anomaly detection. In KDD (pp. 631–636).

  • Prakash, B.A., Sridharan, A., Seshadri, M., Machiraju, S., Faloutsos, C. (2010). Eigenspokes: Surprising patterns and scalable community chipping in large graphs. In PAKDD (2) (pp. 435–448).

  • Seary, A.J., & Richards, W.D. (2003). Spectral methods for analyzing and visualizing networks: an introduction (pp. 209–228). National Research Council, Dynamic Social Network Modelling and Analysis: Workshop Summary and Papers.

  • Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.

    Article  Google Scholar 

  • Shrivastava, N., Majumder, A., Rastogi, R. (2008). Mining (social) network graphs to detect random link attacks. In ICDE (pp. 486–495).

  • Stewart, G.W., & Sun, J.G. (1990). Matrix Perturbation Theory. New York: Academic Press.

    MATH  Google Scholar 

  • Von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4), 395–416.

    Article  MathSciNet  Google Scholar 

  • Wu, L., Ying, X., Wu, X., Z-Zhou, H. (2011). Line orthogonality in adjacency eigenspace with application to community partition. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence.

  • Ying, X., Wu, X., Barbará, D. (2011). Spectrum based fraud detection in social networks. In ICDE (pp. 912–923).

Download references

Acknowledgements

This work was supported in part by U.S. National Science Foundation CCF-1047621 and CNS-0831204 for Leting Wu, Xintao Wu, and Aidong Lu and by NSFC 61021062 and 61273301 for Zhi-Hua Zhou.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xintao Wu.

A Proof

A Proof

Proof of Theorem 3

Let \(U =({\boldsymbol{y}_{2}},\dots,{\boldsymbol{y}_{n}})\) and V = diag(μ 2,...,μ n ). If we apply Theorem V.2.8 in Stewart and Sun (1990), we have:

$$\label{Eq:FromStewart} {\boldsymbol{x}_{1}} = {\boldsymbol{y}_{1}}+U({\mu_{1}} I-V)^{-1}U^TS{\boldsymbol{y}_{1}}, $$
(15)

Equation 15 is quite complex because it involves all the eigenpairs of B. We can further simplify it as:

$$ {\boldsymbol{x}_{1}} = {\boldsymbol{y}_{1}}+\frac{S{\boldsymbol{y}_{1}}}{{\mu_{1}}} , $$
(16)

In order to apply Theorem V.2.8 in Stewart and Sun (1990), the following two conditions need to be satisfied:Footnote 2

  1. 1.

    \(\delta = |{\mu_{1}}-{\mu_{2}}|-\|{\boldsymbol{y}_{1}}^T S {\boldsymbol{y}_{1}}\|_2-\|U^T S U\|_2>0\);

  2. 2.

    \(\gamma = \|U^T S {\boldsymbol{y}_{1}}\|_2 <\frac{1}{2}\delta\).

By (7), \(\|{\boldsymbol{y}_{1}}^TS{\boldsymbol{y}_{1}}\|_2 \approx \sum_{i=1}^k \sum_{j=1}^k \frac{s_{ij}}{n} \approx \frac{k(k-1){p_{s}}}{n}\). U is an n ×(n − 1) matrix whose singular value is 1. So \(\delta > |{\mu_{1}}-{\mu_{2}}|-\|{\boldsymbol{y}_{1}}^T S {\boldsymbol{y}_{1}}\|_2-\|U^T\|_2 \|S\|_2 \|U\|_2\approx {\mu_{1}}-{\mu_{2}} - (1+\frac{k}{n})k{p_{s}}>0\). To satisfy Condition 1, we require:

$$\label{Ineq:Cond1} k{p_{s}}<\frac{{\mu_{1}}-{\mu_{2}}}{1+\frac{k}{n}}. $$
(17)

For Condition 2,

$$ \begin{array}{lll} &\|U^TS{\boldsymbol{y}_{1}}\|_2\leq \|U^T\|_2\|S{\boldsymbol{y}_{1}}\|_2 \\ &\approx \sqrt{\sum\limits_{i=1}^k \left(\sum\limits_{i=1}^k\frac{s_{ij}}{\sqrt{n}} \right)^2} \approx \sqrt{\frac{k}{n}}(k-1){p_{s}} \label{eq:gamma} \end{array} $$
(18)

So to satisfy Condition 2, we require

$$\label{Ineq:Cond2} k{p_{s}} <\frac{ {\mu_{1}}-{\mu_{2}}}{1+2\sqrt{\frac{k}{n}}+\frac{k}{n}} $$
(19)

Combining (17) and (19), we have Inequality (10) to be held when \(k{p_{s}} <\frac{ {\mu_{1}}-{\mu_{2}}}{1+2\sqrt{\frac{k}{n}}+\frac{k}{n}}\). By Theorem 2, we have μ 1 ≈ np b and \({\mu_{2}}\leq 2\sqrt{n{p_b}(1-{p_b})}\). So μ 1 ≫ μ 2. when np b is large. We also assume k = o(n). So the condition can be further simplified to

$$\label{Cond:R1} k{p_{s}} <\frac{ n{p_b}}{1+2\sqrt{\frac{k}{n}}}$$
(20)

At last, we want to discuss about the approximation error. It is divided into two parts. The first part ε 1 is related with the higher order terms which are neglected the approximation in Theorem V.2.8 and \(\|\epsilon_1\|_2 \sim O(\frac{k}{n})\). The second part \(\epsilon_2 = \|U({\mu_{1}}I-V)^{-1}U^TS{\boldsymbol{y}_{1}}-\frac{S{\boldsymbol{y}_{1}}}{{\mu_{1}}}\|_2 \leq \|U\|_2\left\|({\mu_{1}} I-V)^{-1}\right\|_2\left\|U^T\right\|_2\left\|S{\boldsymbol{y}_{1}}\right\|_2 +\frac{\left\|S{\boldsymbol{y}_{1}}\right\|_2}{{\mu_{1}}} \approx\frac{ \sqrt{k}k{p_{s}}(2{\mu_{1}}-{\mu_{2}})}{\sqrt{n}{\mu_{1}}({\mu_{1}}-{\mu_{2}})}\). By (20), \(\|\epsilon_2\|_2 \sim O\left(\sqrt{\frac{k}{n}}\right)\). Combine two parts of error together, the total approximation error is about \( O\left(\sqrt{\frac{k}{n}}\right)\).□

Proof of Theorem 4

Let \({\boldsymbol{v}} =\frac{{\boldsymbol{z}_{1}} -({\boldsymbol{z}_{1}}^T{\boldsymbol{x}_{1}}){\boldsymbol{x}_{1}}}{\|{\boldsymbol{z}_{1}}-({\boldsymbol{z}_{1}}^T{\boldsymbol{x}_{1}}){\boldsymbol{x}_{1}}\|_2}\). Since \({\boldsymbol{v}}^T{\boldsymbol{x}_{1}}=0\), we have the decomposition \({\boldsymbol{v}}= \sum_{j = 2}^n c_j{\boldsymbol{x}_{j}}\) where \(\sum_{j=2}^n c_j^2=1\). Plug in and we have \(\|A{\boldsymbol{v}} - q{\boldsymbol{v}}\|_2^2 = \sum_{i=2}^n c_i^2({\lambda_{i}}-q)^2\). So for an arbitrary set of c i ’s, we have the upper bound as:

$$\sum\limits_{i} c_i^2\leq \frac{\|A{\boldsymbol{v}} - q{\boldsymbol{v}}\|_2^2}{\min({\lambda_{i}}-q)^2}.$$

Plug in A = B + S and \({\boldsymbol{v}}\), we estimate the value of \(\|A{\boldsymbol{v}} - q{\boldsymbol{v}}\|_2^2\). \(\|A{\boldsymbol{v}} - q{\boldsymbol{v}}\|_2 =\|\left(B{\boldsymbol{z}_{1}}+(q-{\lambda_{1}})({\boldsymbol{z}_{1}}^T{\boldsymbol{x}_{1}}){\boldsymbol{x}_{1}}\right)+\left(S{\boldsymbol{z}_{1}}-q{\boldsymbol{z}_{1}}\right)\|_2/\|{\boldsymbol{z}_{1}}-({\boldsymbol{z}_{1}}^T{\boldsymbol{x}_{1}}){\boldsymbol{x}_{1}}\|_2\). We let q = ν 1 ≈ kp s so that \(S{\boldsymbol{z}_{1}}-q{\boldsymbol{z}_{1}}=0\) . By (11) and k = o(n), \({\lambda_{1}} = R(A,{\boldsymbol{x}_{1}})\approx \frac{n^2{p_b} + k^2{p_{s}}}{n} \approx n{p_b}\). By (9) and (10), \({\boldsymbol{z}_{1}}^T{\boldsymbol{x}_{1}} \approx \sqrt{\frac{k}{n}}\). Thus we have \(B{\boldsymbol{z}_{1}}\approx {\lambda_{1}}({\boldsymbol{z}_{1}}^T{\boldsymbol{x}_{1}}){\boldsymbol{x}_{1}}\) and \(\|{\boldsymbol{z}_{1}}-({\boldsymbol{z}_{1}}^T{\boldsymbol{x}_{1}}){\boldsymbol{x}_{1}}\|_2\) \(= \sqrt{1-({\boldsymbol{z}_{1}}^T{\boldsymbol{x}_{1}})^2}\approx \sqrt{1-\frac{k}{n}}\). Finally, we have

$$\|A{\boldsymbol{v}} - q{\boldsymbol{v}}\|_2\approx \frac{\|q({\boldsymbol{z}_{1}}^T{\boldsymbol{x}_{1}}){\boldsymbol{x}_{1}}\|_2}{\|{\boldsymbol{z}_{1}}-({\boldsymbol{z}_{1}}^T{\boldsymbol{x}_{1}}){\boldsymbol{x}_{1}}\|_2}\approx k{p_{s}} \sqrt{\frac{k}{n-k}}.$$

For \({\lambda_{i}} \notin \left(k{p_{s}}-\sqrt{\frac{k}{c^2(n-k)}}k{p_{s}},k{p_{s}}+\sqrt{\frac{k}{c^2(n-k)}}k{p_{s}}\right)\), the sum of corresponding \(c_i^2\)’s is bounded by c 2. So when \({\lambda_{3}}<k{p_{s}}-\sqrt{\frac{k}{c^2(n-k)}}k{p_{s}}<{\lambda_{2}}<k{p_{s}}+\sqrt{\frac{k}{c^2(n-k)}}k{p_{s}}\), we have

$$\sum_{i=3}^n c_i^2\leq \frac{\|A{\boldsymbol{v}} - q{\boldsymbol{v}}\|_2^2}{({\lambda_{3}}-q)^2} <c^2.$$

So \(c_2 = \sqrt{1-\sum_{i=3}^n c_i^2} > \sqrt{1-c^{2}}\).□

1.1 Theoretical explanation of L 1-norm and kurtosis of eigenvectors

When the signal is added, we have L 1-norm as following:

$$ |{\boldsymbol{x}_{1}}|\approx \sqrt{n} + \frac{k^2{p_{s}}}{n\sqrt{n}{p_b}}, \text{ }|{\boldsymbol{x}_{2}}|\approx \sqrt{k}\left(2+\frac{kp_s}{np_b}\right)+\eta+ \tau $$

where η is the error term caused by \({\xi}{{\boldsymbol{x}_{2}}}\) and τ is the error term caused by the calculation of L 1-norm of the error term.

\(|{\boldsymbol{x}_{1}}|\) is not affected by the change of the signal whereas \(|{\boldsymbol{x}_{2}}|\) changes dramatically. For the signal with small size, it has a relative small value close to \(O(\sqrt{k})\). However, the noise terms have a large impact as they accumulate along all the entries. When the signal is strong, the values of the last n − k entries mostly derived from zero so that they are mostly with the same sign and τ ≈ 0. For such signals, η is the major noise to change the L 1-norm of the signal. \(|{\boldsymbol{x}_{2}}|\) increases with the increase of the density of the signals. Sometimes when the signal is too strong, \(|{\boldsymbol{x}_{2}}|\) may become close to \(\sqrt{\frac{2n}{\pi}}\) so that the signal cannot be detected by L 1-norm . For a weak signal, the values of the last n − k entries can be either positive or negative near zero so that τ begins to increase. The weaker the signal is, the more impact the noise term has. τ becomes the major noise. \(|{\boldsymbol{x}_{2}}|\) increases with the decrease of the density of the signal and finally when the signal cannot be separated from the background, \(|{\boldsymbol{x}_{2}}|\) goes back to \(\sqrt{\frac{2n}{\pi}}\).

For a vector \({\boldsymbol{v}} = {\boldsymbol{v}}_1 + {\boldsymbol{v}}_2\),

$${\kappa}({\boldsymbol{v}}) = \frac{n\sum_{i=1}^n({\boldsymbol{v}}_1(i) +{\boldsymbol{v}}_2(i) - \bar{v}_1-\bar{v}_2 )^4}{\left(\sum_{i=1}^n({\boldsymbol{v}}_1(i) +{\boldsymbol{v}}_2(i) - \bar{v}_1-\bar{v}_2 )^2\right)^2}-3$$

where \(\bar{v}_1= \frac{1}{n}\sum_{i=1}^{n} {{\boldsymbol{v}}_1(i)}\) and \(\bar{v}_2= \frac{1}{n}\sum_{i=1}^{n} {{\boldsymbol{v}}_2(i)}\). In our scenario, we usually have \({\boldsymbol{v}}_1\) with a flat distribution and \({\boldsymbol{v}}_2\) with a distribution that has large nonzero values only on the first k entries, where k = o(n). It means that \(|{\boldsymbol{v}}_2(i)- \bar{v}_2| \gg |{\boldsymbol{v}}_1(i)- \bar{v}_1| \) for almost all i ≤ k while \(|{\boldsymbol{v}}_1(i)- \bar{v}_1|\) and \(|{\boldsymbol{v}}_2(i)- \bar{v}_2|\) are both close to zero for almost all i > k. Thus we have:

$${\kappa}({\boldsymbol{v}}) \approx \frac{n\sum_{i=1}^k({\boldsymbol{v}}_2(i) -\bar{v}_2 )^4}{\left(\sum_{i=1}^k({\boldsymbol{v}}_2(i) -\bar{v}_2 )^2\right)^2}-3 \approx {\kappa}({\boldsymbol{v}}_2).$$

If \(|{\boldsymbol{v}}_2(i)- \bar{v}_2|\) for i ≤ k is not significantly large, the terms related with \({\boldsymbol{v}}_2\) can be all omitted because k = o(n). So we have:

$${\kappa}({\boldsymbol{v}}) \approx \frac{n\sum_{i=1}^n({\boldsymbol{v}}_1(i) -\bar{v}_1 )^4}{\left(\sum_{i=1}^n({\boldsymbol{v}}_1(i) -\bar{v}_1 )^2\right)^2}-3 \approx {\kappa}({\boldsymbol{v}}_1)$$

For \({\boldsymbol{x}_{1}}\), we have the approximation:

$${\boldsymbol{x}_{1}} \approx {\boldsymbol{y}_{1}}+\frac{S{\boldsymbol{y}_{1}}}{{\mu_{1}}}, $$

where \({\boldsymbol{y}_{1}}\) has a flat distribution and \(\frac{S{\boldsymbol{y}_{1}}}{{\mu_{1}}}\) has nonzero values only on the first k entries. With a strong signal, \(\frac{S{\boldsymbol{y}_{1}}}{{\mu_{1}}}\) has large values on the first k entires so that \({\kappa}({\boldsymbol{x}_{1}}) \approx {\kappa} \left(\frac{S{\boldsymbol{y}_{1}}}{{\mu_{1}}} \right)\). When the signal becomes weaker, \(\frac{S{\boldsymbol{y}_{1}}}{{\mu_{1}}}\) has smaller values on the first k thus \({\kappa} \left(\frac{S{\boldsymbol{y}_{1}}}{{\mu_{1}}} \right)\) decreases. We can observe \({\kappa}({\boldsymbol{x}_{1}})\) decreases until finally it is close to \({\kappa}({\boldsymbol{y}_{1}})\) when the signal is so weak that it is mixed the background.

We can similarly derive the result of \({\boldsymbol{x}_{2}}\). For \({\boldsymbol{x}_{2}}\), we have the approximation:

$$ {\boldsymbol{x}_{2}} \approx \frac{{\boldsymbol{z}_{1}} - \left({\boldsymbol{z}_{1}}^T{\boldsymbol{x}_{1}}\right){\boldsymbol{x}_{1}}}{\|{\boldsymbol{z}_{1}} -\left({\boldsymbol{z}_{1}}^T{\boldsymbol{x}_{1}}\right){\boldsymbol{x}_{1}}\|_2} \approx {\boldsymbol{z}_{1}} - \left({\boldsymbol{z}_{1}}^T{\boldsymbol{x}_{1}}\right)\left({\boldsymbol{y}_{1}} + \frac{S{\boldsymbol{y}_{1}}}{{\mu_{1}}}\right), $$

where \(({\boldsymbol{z}_{1}}^T{\boldsymbol{x}_{1}}){\boldsymbol{y}_{1}}\) has a flat distribution and \( {\boldsymbol{z}_{1}} -({\boldsymbol{z}_{1}}^T{\boldsymbol{x}_{1}})\frac{S{\boldsymbol{y}_{1}}}{{\mu_{1}}}\) has nonzero values only on the first k entries. With a strong signal, we have \({\kappa}({\boldsymbol{x}_{2}}) \approx {\kappa}\left( {\boldsymbol{z}_{1}} -({\boldsymbol{z}_{1}}^T{\boldsymbol{x}_{1}})\frac{S{\boldsymbol{y}_{1}}}{{\mu_{1}}}\right)\). Since \({\boldsymbol{z}_{1}}\) has much larger value than \(({\boldsymbol{z}_{1}}^T{\boldsymbol{x}_{1}})\frac{S{\boldsymbol{y}_{1}}}{{\mu_{1}}}\) on the first k entries and they both have zero values on the other entries, \({\kappa}({\boldsymbol{x}_{2}}) \approx {\kappa}({\boldsymbol{z}_{1}}) \). Since \({\boldsymbol{z}_{1}}\) has much larger values on the first k nodes than \(\frac{S{\boldsymbol{y}_{1}}}{{\mu_{1}}}\), we have \({\kappa}({\boldsymbol{z}_{1}}) > {\kappa} \left(\frac{S{\boldsymbol{y}_{1}}}{{\mu_{1}}}\right)\) and thus \({\kappa}({\boldsymbol{x}_{2}}) > {\kappa}({\boldsymbol{x}_{1}})\). When the signal becomes weaker, \({\kappa}({\boldsymbol{z}_{1}})\) decreases so that \({\kappa}({\boldsymbol{x}_{2}})\) decreases. When the signal is so weak that it is mixed with the background, \({\kappa}({\boldsymbol{x}_{2}}) \approx {\kappa}\left(({\boldsymbol{z}_{1}}^T{\boldsymbol{x}_{1}}){\boldsymbol{y}_{1}}\right) = {\kappa}({\boldsymbol{y}_{1}})\).

The above theoretical justifications match our observations shown in Table 3.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, L., Wu, X., Lu, A. et al. A spectral approach to detecting subtle anomalies in graphs. J Intell Inf Syst 41, 313–337 (2013). https://doi.org/10.1007/s10844-013-0246-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-013-0246-7

Keywords

Navigation