Skip to main content
Log in

Top-k overlapping densest subgraphs

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Finding dense subgraphs is an important problem in graph mining and has many practical applications. At the same time, while large real-world networks are known to have many communities that are not well-separated, the majority of the existing work focuses on the problem of finding a single densest subgraph. Hence, it is natural to consider the question of finding the top-k densest subgraphs. One major challenge in addressing this question is how to handle overlaps: eliminating overlaps completely is one option, but this may lead to extracting subgraphs not as dense as it would be possible by allowing a limited amount of overlap. Furthermore, overlaps are desirable as in most real-world graphs there are vertices that belong to more than one community, and thus, to more than one densest subgraph. In this paper we study the problem of finding top-k overlapping densest subgraphs, and we present a new approach that improves over the existing techniques, both in theory and practice. First, we reformulate the problem definition in a way that we are able to obtain an algorithm with constant-factor approximation guarantee. Our approach relies on using techniques for solving the max-sum diversification problem, which however, we need to extend in order to make them applicable to our setting. Second, we evaluate our algorithm on a collection of benchmark datasets and show that it convincingly outperforms the previous methods, both in terms of quality and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Here we use the fact that edges are not weighted, and consequently the queue can be implemented as an array of linked lists of vertices.

  2. http://research.ics.aalto.fi/dmg/dos_code.tgz.

  3. The synthetic networks used in our experiments are available at http://research.ics.aalto.fi/dmg/dos_synth.tgz.

  4. http://dblp.uni-trier.de/xml/.

  5. Namely, S. Abiteboul, E. Demaine, M. Ester, C. Faloutsos, J. Han, G. Karypis, J. Kleinberg, H. Mannila, K. Mehlhorn, C. Papadimitriou, B. Shneiderman, G. Weikum and P. Yu.

  6. http://snap.stanford.edu.

  7. Namely, Oceania, Latin-America, the USA, Europe, the Middle-East and East Asia.

References

  • Ahn Y-Y, Bagrow JP, Lehmann S (2010) Link communities reveal multiscale complexity in networks. Nature 466:761–764

    Article  Google Scholar 

  • Andersen R, Chellapilla K (2009) Finding dense subgraphs with size bounds. In: Proceedings of the 6th international workshop on algorithms and models for the web-graph (WAW), p 25–37

  • Angel A, Sarkas N, Koudas N, Srivastava D (2012) Dense subgraph maintenance under streaming edge weight updates for real-time story identification. Proc Very Large Data Bases Endow 5(6):574–585

    Google Scholar 

  • Asahiro Y, Iwama K, Tamaki H, Tokuyama T (1996) Greedily finding a dense subgraph. In: Proceedings of the 5th Scandinavian workshop on algorithm theory (SWAT), p 136–148

  • Balalau OD, Bonchi F, Chan TH, Gullo F, Sozio M (2015) Finding subgraphs with maximum total density and limited overlap. In: Proceedings of the 8th ACM international conference on web search and data mining (WSDM), p 379–388

  • Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 10:2008

    Google Scholar 

  • Borodin A, Lee HC, Ye Y (2012) Max-sum diversification, monotone submodular functions and dynamic updates. In: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (PODS), p 155–166

  • Charikar M (2000) Greedy approximation algorithms for finding dense components in a graph. In: Proceedings of the 3rd international workshop on approximation algorithms for combinatorial optimization (APPROX), p 84–95

  • Chen M, Kuzmin K, Szymanski B (2014) Extension of modularity density for overlapping community structure. In: Proceedings of the 2014 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), p 856–863

  • Chen W, Liu Z, Sun X, Wang Y (2010) A game-theoretic framework to identify overlapping communities in social networks. Data Min Knowl Discov 21(2):224–240

    Article  MathSciNet  Google Scholar 

  • Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70:066111

    Article  Google Scholar 

  • Coscia M, Rossetti G, Giannotti F, Pedreschi D (2012) DEMON: a local-first discovery method for overlapping communities. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), p 615–623

  • Feige U, Peleg D, Kortsarz G (2001) The dense \(k\)-subgraph problem. Algorithmica 29(3):410–421

    Article  MathSciNet  MATH  Google Scholar 

  • Flake GW, Lawrence S, Giles CL (2000) Efficient identification of web communities. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), p 150–160

  • Fratkin E, Naughton BT, Brutlag DL, Batzoglou S (2006) MotifCut: regulatory motifs finding with maximum density subgraphs. Bioinformatics 22(14):150–157

    Article  Google Scholar 

  • Galbrun E, Gionis A, Tatti N (2014) Overlapping community detection in labeled graphs. Data Min Knowl Discov 28(5–6):1586–1610

    Article  MathSciNet  Google Scholar 

  • Garey M, Johnson D (1979) Computers and intractability: a guide to the theory of NP-completeness. WH Freeman and Co., New York

    MATH  Google Scholar 

  • Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA 99:7821–7826

    Article  MathSciNet  MATH  Google Scholar 

  • Goldberg AV (1984) Finding a maximum density subgraph. Technical report. University of California, Berkeley

    Google Scholar 

  • Gregory S (2007) An algorithm to find overlapping community structure in networks. In: Proceedings of the 2007 European conference on principles and practice of knowledge discovery in databases, Part I (ECML/PKDD), p 91–102

  • Gregory S (2010) Finding overlapping communities in networks by label propagation. N J Phys 12(10):103018

    Article  Google Scholar 

  • Håstad J (1996) Clique is hard to approximate within \(n^{1-\epsilon }.\) In: Proceedings of the 37th annual symposium on foundations of computer science (FOCS), p 627–636

  • Karypis G, Kumar V (1998) Multilevel algorithms for multi-constraint graph partitioning. In: Proceedings of the ACM/IEEE conference on supercomputing (SC). IEEE Computer Society, Washington, DC, p 1–13

  • Khuller S, Saha B (2009) On finding dense subgraphs. In: Automata, languages and programming, p 597–608

  • Kumar R, Raghavan P, Rajagopalan S, Tomkins A (1999) Trawling the Web for emerging cyber-communities. Comput Netw 31(11–16):1481–1493

    Article  Google Scholar 

  • Leskovec J, Lang K, Dasgupta A, Mahoney M (2009) Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math 6(1):29–123

    Article  MathSciNet  MATH  Google Scholar 

  • Nemhauser G, Wolsey L, Fisher M (1978) An analysis of approximations for maximizing submodular set functions: I. Math Program 14(1):265–294

    Article  MathSciNet  MATH  Google Scholar 

  • Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems (NIPS), p 849–856

  • Palla G, Derényi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435:814–818

    Article  Google Scholar 

  • Pinney J, Westhead D (2006) Betweenness-based decomposition methods for social and biological networks. In: Interdisciplinary statistics and bioinformatics. Leeds University Press, Leeds, p 87–90

  • Pons P, Latapy M (2006) Computing communities in large networks using random walks. J Graph Algorithms Appl 10(2):284–293

    Article  MathSciNet  MATH  Google Scholar 

  • Schrijver A (2003) Combinatorial optimization. Springer, Berlin

    MATH  Google Scholar 

  • Sozio M, Gionis A (2010) The community-search problem and how to plan a successful cocktail party. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), p 939–948

  • Tatti N, Gionis A (2015) Density-friendly graph decomposition. In: Proceedings of the 24th international conference on world wide web (WWW), p 1089–1099

  • Tsourakakis C (2015) The k-clique densest subgraph problem. In: Proceedings of the 24th international conference on world wide web (WWW), p 1122–1132

  • Tsourakakis C, Bonchi F, Gionis A, Gullo F, Tsiarli M (2013) Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), p 104–112

  • van Dongen S (2000) Graph clustering by flow simulation. PhD Thesis, University of Utrecht

  • von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416

    Article  MathSciNet  Google Scholar 

  • White S, Smyth P (2005) A spectral clustering approach to finding communities in graph. In: Proceedings of the 2005 SIAM international conference on data mining, p 76–84

  • Xie J, Kelley S, Szymanski BK (2013) Overlapping community detection in networks: the state-of-the-art and comparative study. ACM Comput Surv 45(4):43

    Article  MATH  Google Scholar 

  • Xie J, Szymanski BK, Liu X (2011) SLPA: uncovering overlapping communities in social networks via a speaker–listener interaction dynamic process. In: International conference on data mining workshops (ICDMW)

  • Yang J, Leskovec J (2012) Community-affiliation graph model for overlapping network community detection. In: Proceedings of the 12th IEEE international conference on data mining (ICDM), p 1170–1175

  • Yang J, Leskovec J (2013) Overlapping community detection at scale: a nonnegative matrix factorization approach. In: Proceedings of the 6th ACM international conference on web search and data mining (WSDM), p 587–596

  • Zachary W (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33:452–473

    Article  Google Scholar 

  • Zhou H, Lipowsky R (2004) Network Brownian motion: a new method to measure vertex–vertex proximity and to identify communities and subcommunities. Comput Sci (ICCS) 3038:1062–1069

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nikolaj Tatti.

Additional information

Responsible editor: Thomas Gärtner, Mirco Nanni, Andrea Passerini and Celine Robardet.

Appendices

Appendix: Proof of Proposition 1

Let us first define \(h(x;\,Y) = \left[ f(x \cup Y) - f(Y)\right] / 2\) and

$$\begin{aligned} g(x;\,Y) = h(x;\,Y) + d \mathopen {}\left( Y \cup x\right) - d \mathopen {}\left( Y\right) = h(x;\,Y) + d \mathopen {}\left( x,\,Y\right) . \end{aligned}$$

For proving the proposition, we will need Lemma 1.

Lemma 1

Let \( d \) be a c-relaxed metric. Let X and Y be two disjoint sets. Then

$$\begin{aligned} c({\left| X\right| } - 1) d \mathopen {}\left( X,\,Y\right) \ge {\left| Y\right| } d \mathopen {}\left( X\right) . \end{aligned}$$

Proof

Let \(y \in Y\) and \(x,\,z \in X.\) By definition,

$$\begin{aligned} c( d \mathopen {}\left( x,\,y\right) + d \mathopen {}\left( z,\,y\right) ) \ge d \mathopen {}\left( x,\,z\right) . \end{aligned}$$

For a given \(x \in X,\) there are exactly \({\left| X\right| } - 1\) pairs \((x,\,z)\) such that \(x \ne z \in X.\) Consequently, summing over all \(x,\,z \in X\) such that \(x \ne z\) gives us

$$\begin{aligned} 2c({\left| X\right| } - 1) d \mathopen {}\left( X,\,y\right) \ge 2 d \mathopen {}\left( X\right) . \end{aligned}$$

Summing over \(y \in Y\) proves the lemma. \(\square \)

Proof of Proposition 1

Let \(G_1 \subset \cdots \subset G_k\) be the sets during Greedy. Fix \(1 \le i \le k.\) Then \(G_i\) is the current solution after ith iteration of Greedy.

Let O be the optimal solution. Write \(A = O \cap G_i,\,C = O \setminus A,\) and \(B = G_i \setminus A.\) Lemma 1 implies that

$$\begin{aligned} c({\left| A\right| } - 1) d \mathopen {}\left( A,\,C\right) \ge {\left| C\right| } d \mathopen {}\left( A\right) , \end{aligned}$$

which in turn implies

$$\begin{aligned} \begin{aligned} {\left| C\right| }i( d \mathopen {}\left( A\right) + d \mathopen {}\left( A,\,C\right) )&\le ci({\left| A\right| } - 1) d \mathopen {}\left( A,\,C\right) + {\left| C\right| }i d \mathopen {}\left( A,\,C\right) \\&= ci({\left| A\right| } - 1 + {\left| C\right| }) d \mathopen {}\left( A,\,C\right) \\&= ci(k - 1) d \mathopen {}\left( A,\,C\right) . \\ \end{aligned} \end{aligned}$$

Moreover, Lemma 1 implies that

$$\begin{aligned} \begin{aligned} c({\left| C\right| } - 1) d \mathopen {}\left( B,\,C\right)&\ge {\left| B\right| } d \mathopen {}\left( C\right) , \\ c({\left| C\right| } - 1) d \mathopen {}\left( A,\,C\right)&\ge {\left| A\right| } d \mathopen {}\left( C\right) , \\ \end{aligned} \end{aligned}$$

which, together with \({\left| C\right| } = {\left| B\right| } + k - i,\) implies

$$\begin{aligned} \begin{aligned} {\left| C\right| }i d \mathopen {}\left( C\right)&= (k - i)i d \mathopen {}\left( C\right) + {\left| B\right| }i d \mathopen {}\left( C\right) \\&= (k - i)({\left| A\right| } + {\left| B\right| }) d \mathopen {}\left( C\right) + {\left| B\right| }i d \mathopen {}\left( C\right) \\&= (k - i){\left| A\right| } d \mathopen {}\left( C\right) + {\left| B\right| }k d \mathopen {}\left( C\right) \\&\le c(k - i)({\left| C\right| } - 1) d \mathopen {}\left( A,\,C\right) + ck({\left| C\right| } - 1) d \mathopen {}\left( B,\,C\right) \\&\le c(k - i)(k - 1) d \mathopen {}\left( A,\, C\right) + ck(k - 1) d \mathopen {}\left( B,\,C\right) .\\ \end{aligned} \end{aligned}$$

Combining these two inequalities leads us to

$$\begin{aligned} \begin{aligned} {\left| C\right| }i d \mathopen {}\left( O\right)&= {\left| C\right| }i d \mathopen {}\left( A\right) + {\left| C\right| }i d \mathopen {}\left( C\right) + {\left| C\right| }i d \mathopen {}\left( A,\,C\right) \\&\le ck(k - 1)( d \mathopen {}\left( A,\,C\right) + d \mathopen {}\left( B,\, C\right) ) \\&= ck(k - 1)d\left( G_i,\,C\right) . \\ \end{aligned} \end{aligned}$$

Submodularity and monotonicity imply

$$\begin{aligned} \begin{aligned} \sum _{v \in C} g\left( v;\,G_i\right)&= \sum _{v \in C} \left[ h\left( v;\,G_i\right) + d\left( \{v\},\, G_i\right) \right] \\&= \left( \sum _{v \in C} h\left( v;\, G_i\right) \right) + d\left( C,\,G_i\right) \\&\ge \frac{1}{2}\left[ f(O) - f\left( G_i\right) \right] + \frac{i{\left| C\right| }}{ck(k - 1)} d \mathopen {}\left( O\right) \\&\ge \frac{1}{2}\left[ f(O) - f\left( G_k\right) \right] + \frac{i{\left| C\right| }}{ck(k - 1)} d \mathopen {}\left( O\right) . \end{aligned} \end{aligned}$$

Let \(u_i\) be the item added at the \(i + 1\)th step, \(G_{i + 1} = \left\{ u_i\right\} \cup G_i.\) Then, since \(g(u_i;\,G_i) \ge \alpha g(v;\,G_i)\) for any \(v \in C,\)

$$\begin{aligned} g\left( u_i;\, G_i\right) \ge \frac{\alpha }{2k}\left[ f(O) - f\left( G_k\right) \right] + \frac{i\alpha }{ck(k - 1)} d \mathopen {}\left( O\right) . \end{aligned}$$

Summing over i gives us

$$\begin{aligned} \frac{1}{2}f\left( G_k\right) + d\left( G_k\right) = \sum _{i = 0}^{k - 1}g\left( u_i;\, G_i\right) \ge \frac{\alpha }{2}\left[ f(O) - f\left( G_k\right) \right] + \frac{\alpha }{2c} d \mathopen {}\left( O\right) . \end{aligned}$$

Since \(\alpha \le 1\) and \(c \ge 1,\) we have

$$\begin{aligned} r\left( G_k\right) = f\left( G_k\right) + d\left( G_k\right) \ge \frac{\alpha }{2}f(O) + \frac{\alpha }{2c} d \mathopen {}\left( O\right) \ge \frac{\alpha }{2c} r \mathopen {}\left( O\right) , \end{aligned}$$

which completes the proof. \(\square \)

Proof of Proposition 4

To prove the proposition we need to first show that Modify does not decrease the gain of a set significantly.

Lemma 2

Assume a graph \(G = (V,\,E).\) Assume a collection of k distinct subgraphs \(\mathcal {W}\) of G,  and let \(U \in \mathcal {W}.\) Assume that \(k < {\left| V\right| }\) and G contains more than k wedges, i.e., connected subgraphs of size 3. Let \(M = \mathsf{{Modify}} (U,\, G,\, \mathcal {W},\, \lambda ).\) Then \( \chi \mathopen {}\left( V;\,\mathcal {W}\right) \ge 2/5 \times ( \chi \mathopen {}\left( U,\,\mathcal {W}\right) +\lambda ).\)

Proof

Write \(r = {\left| U\right| }\) and \(\alpha = \frac{r}{r + 1}.\) We will split the proof in two cases. Case 1 assume that X,  as given in Algorithm 3, is not empty. Select \(B \in X.\) We will show that

$$\begin{aligned} \mathrm {dens} \mathopen {}\left( B\right) \ge \alpha \mathrm {dens} \mathopen {}\left( U\right) \quad \text {and}\quad D \mathopen {}\left( B,\,W\right) \ge \alpha ( D \mathopen {}\left( U,\, W\right) + I[U = W]), \end{aligned}$$

for any \(W \in \mathcal {W},\) where \(I[U = W] = 1\) if \(U = W,\) and 0 otherwise. This automatically guarantees that

$$\begin{aligned} \chi \mathopen {}\left( B;\, \mathcal {W},\, \lambda \right) \ge \alpha ( \chi \mathopen {}\left( U;\, \mathcal {W},\, \lambda \right) + \lambda ), \end{aligned}$$

proving the result since \(\alpha \ge 1/2\) and the gain of M is at least as good as the gain of B.

To prove the first inequality, note that

$$\begin{aligned} \mathrm {dens} \mathopen {}\left( B\right) = \frac{{\left| E(B)\right| }}{r + 1} \ge \frac{{\left| E(U)\right| }}{r + 1} = \alpha \frac{{\left| E(U)\right| }}{r} = \alpha \mathrm {dens} \mathopen {}\left( U\right) . \end{aligned}$$

To prove the second inequality fix \(W \in \mathcal {W},\) and let \(p = {\left| W\right| },\,q = {\left| W \cap U\right| }.\) Define

$$\begin{aligned} \varDelta = D \mathopen {}\left( U,\,W\right) + I[U = W] = 2 - \frac{q^2}{rp} = \frac{2rp - q^2}{rp}. \end{aligned}$$

Let v be the only vertex in \(B \setminus U.\) If \(v \notin W,\) then \( D \mathopen {}\left( B,\,W\right) \ge \varDelta .\) Hence, we can assume that \(v \in W.\) This leads to

$$\begin{aligned} \begin{aligned} D \mathopen {}\left( B,\, W\right)&= 2 - \frac{{\left| B \cap W\right| }^2}{{\left| B\right| }{\left| W\right| }} \\&= 2 - \frac{ (1 + q)^2}{(1 + r)p} = \frac{2p(1 + r) - (1 + q)^2}{(1 + r)p}. \end{aligned} \end{aligned}$$

Let us define \(\beta \) as the fraction of the numerators,

$$\begin{aligned} \beta = \frac{2p(1 + r) - (1 + q)^2}{2rp - q^2}. \end{aligned}$$

We wish to show that \(\beta \ge 1.\) Since \(p \ge q + 1,\)

$$\begin{aligned} \begin{aligned} \beta&= \frac{2p(1 + r) - (1 + q)^2}{2rp - q^2} = \frac{2rp - q^2 + 2p - 2q - 1}{2rp - q^2} \\&\ge \frac{2rp - q^2 + 2(q + 1) - 2q - 1}{2rp - q^2} = \frac{2rp - q^2 + 1}{2rp - q^2} \ge 1. \end{aligned} \end{aligned}$$

The ratio of distances is now

$$\begin{aligned} \frac{ D \mathopen {}\left( B,\,W\right) }{\varDelta } = \beta \frac{r}{r + 1} \ge \frac{r}{r + 1} = \alpha . \end{aligned}$$

This proves the first case.

Case 2 assume that \(X = \emptyset .\) Then we must have \(Y \ne \emptyset \) and \(r \ge 2,\) as otherwise \({\left| \mathcal {W}\right| } \ge {\left| V\right| },\) which violates the assumption of the lemma.

Assume that \( \mathrm {dens} \mathopen {}\left( U\right) \ge 5/3.\) Let \(B \in Y.\) Removing a single item of U decreases the density by 1, at most. This gives us

$$\begin{aligned} \frac{ \mathrm {dens} \mathopen {}\left( B\right) }{ \mathrm {dens} \mathopen {}\left( U\right) } \ge \frac{ \mathrm {dens} \mathopen {}\left( U\right) - 1}{ \mathrm {dens} \mathopen {}\left( U\right) } \ge \frac{5/3 - 1}{5 / 3} = \frac{2}{5}. \end{aligned}$$

To bound the distance term, fix \(W \in \mathcal {W},\) and let \(p = {\left| W\right| },\,q = {\left| W \cap U\right| }.\) Let v be the only vertex in \(U \setminus V.\) Define \(\varDelta = D \mathopen {}\left( U,\,W\right) + I[U = W].\) If \(v \in W,\) then we can easily show that \( D \mathopen {}\left( V,\,W\right) \ge \varDelta .\) Hence, assume that \(v \notin W.\) This implies that \(q \le \min p,\,r - 1,\) or \(q^2 \le p(r - 1).\) As before, we can express the distance term as

$$\begin{aligned} \varDelta = 2 - \frac{q^2}{rp} = \frac{2rp - q^2}{rp}, \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} D \mathopen {}\left( B,\, W\right)&= 2 - \frac{{\left| B \cap W\right| }^2}{{\left| B\right| }{\left| W\right| }} = 2 - \frac{ q^2}{(r - 1)p} = \frac{2p(r - 1) - q^2}{(r - 1)p}. \end{aligned} \end{aligned}$$

The ratio is then

$$\begin{aligned} \begin{aligned} \frac{ D \mathopen {}\left( B,\, W\right) }{\varDelta }&= \frac{2p(r - 1) - q^2}{2rp - q^2}\frac{r}{r - 1} \\&\ge \frac{2p(r - 1) - p(r - 1)}{2rp - p(r - 1)}\frac{r}{r - 1} = \frac{p(r - 1)}{rp + p}\frac{r}{r - 1} = \frac{r}{r + 1} \ge 1/2, \end{aligned} \end{aligned}$$

where the first inequality follows from the fact that the ratio is decreasing as function of q.

Assume that \( \mathrm {dens} \mathopen {}\left( U\right) < 5/ 3.\) By assumption there is a wedge B outside \(\mathcal {W}.\) Since \( \mathrm {dens} \mathopen {}\left( B\right) \ge 2/3,\) we have \( \mathrm {dens} \mathopen {}\left( B\right) / \mathrm {dens} \mathopen {}\left( U\right) \ge 2 / 5.\) The distance terms decrease by a factor of 1/2, since

$$\begin{aligned} D \mathopen {}\left( U,\, W\right) \le 2 = 2 \times 1 \le 2 D \mathopen {}\left( B,\, W\right) . \end{aligned}$$

Combining the inequalities proves that

$$\begin{aligned} \chi \mathopen {}\left( B;\, \mathcal {W},\, \lambda \right) \ge \frac{2}{5} \chi \mathopen {}\left( U;\, \mathcal {W},\, \lambda \right) , \end{aligned}$$

which proves the lemma. \(\square \)

Proof of Proposition 4

To prove the proposition, we will first form a new graph H,  and show that the density of a subgraph in H is closely related to the gain. This then allows us to prove the statement.

Let us first construct the graph \(H{\text {:}}\) given a vertex v let us define

$$\begin{aligned} s(v) = {-}\sum _{v \in W_j} \frac{2\lambda }{{\left| W_j\right| }}. \end{aligned}$$

Let \(H = (V,\, E^{\prime },\, c)\) be a fully connected weighted graph with self-loops where the weight of an edge \(c(v,\, w)\) is

$$\begin{aligned} c(v,\, w) = I[(v,\, w) \in E] - \sum _{j \mid v,\, w \in W_j} \frac{4 \lambda }{{\left| W_j\right| }}, \end{aligned}$$

for \(v \ne w,\) and \(c(v,\, v) = s(v).\)

Next, we connect the gain of set of vertices U (w.r.t. G) with the weighted density of U in H. Given an arbitrary set of vertices U,  we will write c(U) to mean the total weight of edges in H. Each \(c(v,\, w),\) for \(v \in w,\) participates in \(\deg _{H}(v;\, U)\) and \(\deg _{H}(w;\, U),\) and each \(c(v,\, v) = s(v)\) participates (once) in \(\deg _{H}(v;\, U).\) This leads to

$$\begin{aligned} 2c(U) = \sum _{v \in U} \deg _{H}(v;\, U) + s(v). \end{aligned}$$

We can express the (weighted) degree of a vertex in H as

$$\begin{aligned} \begin{aligned} \deg _{H}(v;\, U)&= s(v) + \sum _{\begin{array}{c} w \in U \\ w \ne v \end{array}} c(v,\, w) = \deg _G (v ;\, U) -\sum _{j \mid v \in W_j} \frac{2\lambda }{{\left| W_j\right| }} \\&\quad - \sum _{\begin{array}{c} w \in U \\ w \ne v \end{array}} \sum _{j \mid v,\, w \in W_j} \frac{4 \lambda }{{\left| W_j\right| }} \\&= \deg _G (v ;\, U) - \lambda \sum _{j \mid v \in W_j} \frac{4{\left| U \cap W_j\right| } - 2}{{\left| W_j\right| }}. \end{aligned} \end{aligned}$$
(3)

Write \(k = {\left| \mathcal {W}\right| }.\) These equalities lead to the following identity,

$$\begin{aligned} \begin{aligned} \mathrm {dens} \mathopen {}\left( U;\,H\right) + 4\lambda k&= \frac{1}{{\left| U\right| }}c(U) + 4\lambda k \\&= 4\lambda k + \frac{1}{2{\left| U\right| }} \sum _{v \in U} \deg _{H}(v;\, U) + s(v) \\&= 4\lambda k + \mathrm {dens} \mathopen {}\left( U;\, G\right) - \frac{1}{2{\left| U\right| }} \sum _{v \in U} \lambda \sum _{j \mid v \in W_j} \frac{4{\left| U \cap W_j\right| }}{{\left| W_j\right| }} \\&= \mathrm {dens} \mathopen {}\left( U;\, G\right) - 2\lambda \sum _{j = 1}^k2 - \frac{{\left| U \cap W_j\right| }^2}{{\left| U\right| }{\left| W_j\right| }} \\&= 2 \chi \mathopen {}\left( U;\, \mathcal {W},\, \lambda \right) + \epsilon (U,\, \mathcal {W}), \\ \end{aligned} \end{aligned}$$
(4)

where \(\epsilon (U,\, \mathcal {W})\) is a correction term, equal to 2\(\lambda \) if \(U \in \mathcal {W},\) and 0 otherwise.

Let O be the densest subgraph in H. Next we show that during the for-loop Peel finds a graph whose density close to \( \mathrm {dens} \mathopen {}\left( O;\, H\right) .\) Let o be the first vertex in O deleted by Peel. We must have

$$\begin{aligned} \deg _H(o;\, O) \ge \mathrm {dens} \mathopen {}\left( O;\, H\right) , \end{aligned}$$

as otherwise we can delete o from O and obtain a better solution. Let \(R = V_i\) be the graph at the moment when o is about to be removed. Let us compare \(\deg _H(o;\,O)\) and \(\deg _H(o;\, R).\) We can lower-bound of the second term of the right-hand side in Eq. (3) by \({-}4k\lambda - s(v).\) Since \(O \subseteq R,\) this gives us

$$\begin{aligned} \begin{aligned} \deg _H(o;\, O)&\le \deg _G(o;\, O) \le \deg _G(o;\, R) \\&\le \deg _H(o;\, R) + s(o) + 4k\lambda . \end{aligned} \end{aligned}$$

To upper-bound the first two terms, note that by definition of Peel, the vertex o has the smallest \(\deg _H(o;\, R) + s(o)\) among all the vertices in R. Hence,

$$\begin{aligned} \deg _H(o;\, R) + s(o) \le \sum _{v \in R} \frac{\deg _H(v;\, R) + s(v)}{{\left| R\right| }} = 2\frac{c(R)}{{\left| R\right| }} = 2 \mathrm {dens} \mathopen {}\left( R;\, H\right) . \end{aligned}$$

To complete the proof, let \(O^{\prime }\) be the graph outside \(\mathcal {W},\) maximizing the gain. Due to Eq. (4), we have

$$\begin{aligned} \begin{aligned} 2 \chi \mathopen {}\left( O^{\prime };\, \mathcal {W},\, \lambda \right)&= \mathrm {dens} \mathopen {}\left( O^{\prime };\, H\right) + 4k\lambda \le \mathrm {dens} \mathopen {}\left( O;\, H\right) + 4k\lambda \\&\le 2 \mathrm {dens} \mathopen {}\left( R;\, H\right) + 8k\lambda = 2( \mathrm {dens} \mathopen {}\left( R;\, H\right) + 4k\lambda ) \\&= 4 \chi \mathopen {}\left( R;\, \mathcal {W},\, \lambda \right) + 2\epsilon (R;\, \mathcal {W}). \end{aligned} \end{aligned}$$

Let S be the set returned by Peel.

If \(R \notin \mathcal {W},\) then \(\epsilon (R;\, \mathcal {W}) = 0.\) Moreover, R is not modified, and is one of the graphs that is tested for gain. Consequently, \( \chi \mathopen {}\left( S;\, \mathcal {W}\right) \ge \chi \mathopen {}\left( R;\, \mathcal {W}\right) ,\) proving the statement.

If \(R \in \mathcal {W},\) then it is modified by Modify to, say, \(R^{\prime }.\) Lemma 2 implies that \(5/2 \times \chi \mathopen {}\left( R^{\prime },\, \mathcal {W}\right) \ge \chi \mathopen {}\left( R;\, \mathcal {W}\right) + \epsilon (R;\, \mathcal {W}).\) Since, \( \chi \mathopen {}\left( S;\, \mathcal {W}\right) \ge \chi \mathopen {}\left( R^{\prime };\, \mathcal {W}\right) ,\) this completes the proof. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Galbrun, E., Gionis, A. & Tatti, N. Top-k overlapping densest subgraphs. Data Min Knowl Disc 30, 1134–1165 (2016). https://doi.org/10.1007/s10618-016-0464-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-016-0464-z

Keywords

Navigation