Top-k overlapping densest subgraphs

Galbrun, Esther; Gionis, Aristides; Tatti, Nikolaj

doi:10.1007/s10618-016-0464-z

Top-k overlapping densest subgraphs

Published: 26 May 2016

Volume 30, pages 1134–1165, (2016)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Esther Galbrun¹,
Aristides Gionis² &
Nikolaj Tatti²

1096 Accesses
41 Citations
3 Altmetric
Explore all metrics

Abstract

Finding dense subgraphs is an important problem in graph mining and has many practical applications. At the same time, while large real-world networks are known to have many communities that are not well-separated, the majority of the existing work focuses on the problem of finding a single densest subgraph. Hence, it is natural to consider the question of finding the top-k densest subgraphs. One major challenge in addressing this question is how to handle overlaps: eliminating overlaps completely is one option, but this may lead to extracting subgraphs not as dense as it would be possible by allowing a limited amount of overlap. Furthermore, overlaps are desirable as in most real-world graphs there are vertices that belong to more than one community, and thus, to more than one densest subgraph. In this paper we study the problem of finding top-k overlapping densest subgraphs, and we present a new approach that improves over the existing techniques, both in theory and practice. First, we reformulate the problem definition in a way that we are able to obtain an algorithm with constant-factor approximation guarantee. Our approach relies on using techniques for solving the max-sum diversification problem, which however, we need to extend in order to make them applicable to our setting. Second, we evaluate our algorithm on a collection of benchmark datasets and show that it convincingly outperforms the previous methods, both in terms of quality and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Top-k overlapping densest subgraphs: approximation algorithms and computational complexity

Article Open access 04 November 2020

Dense Subgraphs in Biological Networks

Discovering Hierarchical Subgraphs of K-Core-Truss

Notes

Here we use the fact that edges are not weighted, and consequently the queue can be implemented as an array of linked lists of vertices.
http://research.ics.aalto.fi/dmg/dos_code.tgz.
The synthetic networks used in our experiments are available at http://research.ics.aalto.fi/dmg/dos_synth.tgz.
http://dblp.uni-trier.de/xml/.
Namely, S. Abiteboul, E. Demaine, M. Ester, C. Faloutsos, J. Han, G. Karypis, J. Kleinberg, H. Mannila, K. Mehlhorn, C. Papadimitriou, B. Shneiderman, G. Weikum and P. Yu.
http://snap.stanford.edu.
Namely, Oceania, Latin-America, the USA, Europe, the Middle-East and East Asia.

References

Ahn Y-Y, Bagrow JP, Lehmann S (2010) Link communities reveal multiscale complexity in networks. Nature 466:761–764
Article Google Scholar
Andersen R, Chellapilla K (2009) Finding dense subgraphs with size bounds. In: Proceedings of the 6th international workshop on algorithms and models for the web-graph (WAW), p 25–37
Angel A, Sarkas N, Koudas N, Srivastava D (2012) Dense subgraph maintenance under streaming edge weight updates for real-time story identification. Proc Very Large Data Bases Endow 5(6):574–585
Google Scholar
Asahiro Y, Iwama K, Tamaki H, Tokuyama T (1996) Greedily finding a dense subgraph. In: Proceedings of the 5th Scandinavian workshop on algorithm theory (SWAT), p 136–148
Balalau OD, Bonchi F, Chan TH, Gullo F, Sozio M (2015) Finding subgraphs with maximum total density and limited overlap. In: Proceedings of the 8th ACM international conference on web search and data mining (WSDM), p 379–388
Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 10:2008
Google Scholar
Borodin A, Lee HC, Ye Y (2012) Max-sum diversification, monotone submodular functions and dynamic updates. In: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (PODS), p 155–166
Charikar M (2000) Greedy approximation algorithms for finding dense components in a graph. In: Proceedings of the 3rd international workshop on approximation algorithms for combinatorial optimization (APPROX), p 84–95
Chen M, Kuzmin K, Szymanski B (2014) Extension of modularity density for overlapping community structure. In: Proceedings of the 2014 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), p 856–863
Chen W, Liu Z, Sun X, Wang Y (2010) A game-theoretic framework to identify overlapping communities in social networks. Data Min Knowl Discov 21(2):224–240
Article MathSciNet Google Scholar
Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70:066111
Article Google Scholar
Coscia M, Rossetti G, Giannotti F, Pedreschi D (2012) DEMON: a local-first discovery method for overlapping communities. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), p 615–623
Feige U, Peleg D, Kortsarz G (2001) The dense $k$-subgraph problem. Algorithmica 29(3):410–421
Article MathSciNet MATH Google Scholar
Flake GW, Lawrence S, Giles CL (2000) Efficient identification of web communities. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), p 150–160
Fratkin E, Naughton BT, Brutlag DL, Batzoglou S (2006) MotifCut: regulatory motifs finding with maximum density subgraphs. Bioinformatics 22(14):150–157
Article Google Scholar
Galbrun E, Gionis A, Tatti N (2014) Overlapping community detection in labeled graphs. Data Min Knowl Discov 28(5–6):1586–1610
Article MathSciNet Google Scholar
Garey M, Johnson D (1979) Computers and intractability: a guide to the theory of NP-completeness. WH Freeman and Co., New York
MATH Google Scholar
Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA 99:7821–7826
Article MathSciNet MATH Google Scholar
Goldberg AV (1984) Finding a maximum density subgraph. Technical report. University of California, Berkeley
Google Scholar
Gregory S (2007) An algorithm to find overlapping community structure in networks. In: Proceedings of the 2007 European conference on principles and practice of knowledge discovery in databases, Part I (ECML/PKDD), p 91–102
Gregory S (2010) Finding overlapping communities in networks by label propagation. N J Phys 12(10):103018
Article Google Scholar
Håstad J (1996) Clique is hard to approximate within $n^{1-\epsilon }.$ In: Proceedings of the 37th annual symposium on foundations of computer science (FOCS), p 627–636
Karypis G, Kumar V (1998) Multilevel algorithms for multi-constraint graph partitioning. In: Proceedings of the ACM/IEEE conference on supercomputing (SC). IEEE Computer Society, Washington, DC, p 1–13
Khuller S, Saha B (2009) On finding dense subgraphs. In: Automata, languages and programming, p 597–608
Kumar R, Raghavan P, Rajagopalan S, Tomkins A (1999) Trawling the Web for emerging cyber-communities. Comput Netw 31(11–16):1481–1493
Article Google Scholar
Leskovec J, Lang K, Dasgupta A, Mahoney M (2009) Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math 6(1):29–123
Article MathSciNet MATH Google Scholar
Nemhauser G, Wolsey L, Fisher M (1978) An analysis of approximations for maximizing submodular set functions: I. Math Program 14(1):265–294
Article MathSciNet MATH Google Scholar
Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems (NIPS), p 849–856
Palla G, Derényi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435:814–818
Article Google Scholar
Pinney J, Westhead D (2006) Betweenness-based decomposition methods for social and biological networks. In: Interdisciplinary statistics and bioinformatics. Leeds University Press, Leeds, p 87–90
Pons P, Latapy M (2006) Computing communities in large networks using random walks. J Graph Algorithms Appl 10(2):284–293
Article MathSciNet MATH Google Scholar
Schrijver A (2003) Combinatorial optimization. Springer, Berlin
MATH Google Scholar
Sozio M, Gionis A (2010) The community-search problem and how to plan a successful cocktail party. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), p 939–948
Tatti N, Gionis A (2015) Density-friendly graph decomposition. In: Proceedings of the 24th international conference on world wide web (WWW), p 1089–1099
Tsourakakis C (2015) The k-clique densest subgraph problem. In: Proceedings of the 24th international conference on world wide web (WWW), p 1122–1132
Tsourakakis C, Bonchi F, Gionis A, Gullo F, Tsiarli M (2013) Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), p 104–112
van Dongen S (2000) Graph clustering by flow simulation. PhD Thesis, University of Utrecht
von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
Article MathSciNet Google Scholar
White S, Smyth P (2005) A spectral clustering approach to finding communities in graph. In: Proceedings of the 2005 SIAM international conference on data mining, p 76–84
Xie J, Kelley S, Szymanski BK (2013) Overlapping community detection in networks: the state-of-the-art and comparative study. ACM Comput Surv 45(4):43
Article MATH Google Scholar
Xie J, Szymanski BK, Liu X (2011) SLPA: uncovering overlapping communities in social networks via a speaker–listener interaction dynamic process. In: International conference on data mining workshops (ICDMW)
Yang J, Leskovec J (2012) Community-affiliation graph model for overlapping network community detection. In: Proceedings of the 12th IEEE international conference on data mining (ICDM), p 1170–1175
Yang J, Leskovec J (2013) Overlapping community detection at scale: a nonnegative matrix factorization approach. In: Proceedings of the 6th ACM international conference on web search and data mining (WSDM), p 587–596
Zachary W (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33:452–473
Article Google Scholar
Zhou H, Lipowsky R (2004) Network Brownian motion: a new method to measure vertex–vertex proximity and to identify communities and subcommunities. Comput Sci (ICCS) 3038:1062–1069
Google Scholar

Download references

Author information

Authors and Affiliations

Inria Nancy – Grand Est, Villers-lès-Nancy, France
Esther Galbrun
Helsinki Institute for Information Technology (HIIT) and Department of Computer Science, Aalto University, Helsinki, Finland
Aristides Gionis & Nikolaj Tatti

Authors

Esther Galbrun
View author publications
You can also search for this author in PubMed Google Scholar
Aristides Gionis
View author publications
You can also search for this author in PubMed Google Scholar
Nikolaj Tatti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nikolaj Tatti.

Additional information

Responsible editor: Thomas Gärtner, Mirco Nanni, Andrea Passerini and Celine Robardet.

Appendices

Appendix: Proof of Proposition 1

Let us first define $h(x;\,Y) = \left[ f(x \cup Y) - f(Y)\right] / 2$ and

$$\begin{aligned} g(x;\,Y) = h(x;\,Y) + d \mathopen {}\left( Y \cup x\right) - d \mathopen {}\left( Y\right) = h(x;\,Y) + d \mathopen {}\left( x,\,Y\right) . \end{aligned}$$

For proving the proposition, we will need Lemma 1.

Lemma 1

Let $ d $ be a c-relaxed metric. Let X and Y be two disjoint sets. Then

$$\begin{aligned} c({\left| X\right| } - 1) d \mathopen {}\left( X,\,Y\right) \ge {\left| Y\right| } d \mathopen {}\left( X\right) . \end{aligned}$$

Proof

Let $y \in Y$ and $x,\,z \in X.$ By definition,

$$\begin{aligned} c( d \mathopen {}\left( x,\,y\right) + d \mathopen {}\left( z,\,y\right) ) \ge d \mathopen {}\left( x,\,z\right) . \end{aligned}$$

For a given $x \in X,$ there are exactly ${\left| X\right| } - 1$ pairs $(x,\,z)$ such that $x \ne z \in X.$ Consequently, summing over all $x,\,z \in X$ such that $x \ne z$ gives us

$$\begin{aligned} 2c({\left| X\right| } - 1) d \mathopen {}\left( X,\,y\right) \ge 2 d \mathopen {}\left( X\right) . \end{aligned}$$

Summing over $y \in Y$ proves the lemma. $\square $

Proof of Proposition 1

Let $G_1 \subset \cdots \subset G_k$ be the sets during Greedy. Fix $1 \le i \le k.$ Then $G_i$ is the current solution after ith iteration of Greedy.

Let O be the optimal solution. Write $A = O \cap G_i,\,C = O \setminus A,$ and $B = G_i \setminus A.$ Lemma 1 implies that

$$\begin{aligned} c({\left| A\right| } - 1) d \mathopen {}\left( A,\,C\right) \ge {\left| C\right| } d \mathopen {}\left( A\right) , \end{aligned}$$

which in turn implies

$$\begin{aligned} \begin{aligned} {\left| C\right| }i( d \mathopen {}\left( A\right) + d \mathopen {}\left( A,\,C\right) )&\le ci({\left| A\right| } - 1) d \mathopen {}\left( A,\,C\right) + {\left| C\right| }i d \mathopen {}\left( A,\,C\right) \\&= ci({\left| A\right| } - 1 + {\left| C\right| }) d \mathopen {}\left( A,\,C\right) \\&= ci(k - 1) d \mathopen {}\left( A,\,C\right) . \\ \end{aligned} \end{aligned}$$

Moreover, Lemma 1 implies that

$$\begin{aligned} \begin{aligned} c({\left| C\right| } - 1) d \mathopen {}\left( B,\,C\right)&\ge {\left| B\right| } d \mathopen {}\left( C\right) , \\ c({\left| C\right| } - 1) d \mathopen {}\left( A,\,C\right)&\ge {\left| A\right| } d \mathopen {}\left( C\right) , \\ \end{aligned} \end{aligned}$$

which, together with ${\left| C\right| } = {\left| B\right| } + k - i,$ implies

$$\begin{aligned} \begin{aligned} {\left| C\right| }i d \mathopen {}\left( C\right)&= (k - i)i d \mathopen {}\left( C\right) + {\left| B\right| }i d \mathopen {}\left( C\right) \\&= (k - i)({\left| A\right| } + {\left| B\right| }) d \mathopen {}\left( C\right) + {\left| B\right| }i d \mathopen {}\left( C\right) \\&= (k - i){\left| A\right| } d \mathopen {}\left( C\right) + {\left| B\right| }k d \mathopen {}\left( C\right) \\&\le c(k - i)({\left| C\right| } - 1) d \mathopen {}\left( A,\,C\right) + ck({\left| C\right| } - 1) d \mathopen {}\left( B,\,C\right) \\&\le c(k - i)(k - 1) d \mathopen {}\left( A,\, C\right) + ck(k - 1) d \mathopen {}\left( B,\,C\right) .\\ \end{aligned} \end{aligned}$$

Combining these two inequalities leads us to

$$\begin{aligned} \begin{aligned} {\left| C\right| }i d \mathopen {}\left( O\right)&= {\left| C\right| }i d \mathopen {}\left( A\right) + {\left| C\right| }i d \mathopen {}\left( C\right) + {\left| C\right| }i d \mathopen {}\left( A,\,C\right) \\&\le ck(k - 1)( d \mathopen {}\left( A,\,C\right) + d \mathopen {}\left( B,\, C\right) ) \\&= ck(k - 1)d\left( G_i,\,C\right) . \\ \end{aligned} \end{aligned}$$

Submodularity and monotonicity imply

$$\begin{aligned} \begin{aligned} \sum _{v \in C} g\left( v;\,G_i\right)&= \sum _{v \in C} \left[ h\left( v;\,G_i\right) + d\left( \{v\},\, G_i\right) \right] \\&= \left( \sum _{v \in C} h\left( v;\, G_i\right) \right) + d\left( C,\,G_i\right) \\&\ge \frac{1}{2}\left[ f(O) - f\left( G_i\right) \right] + \frac{i{\left| C\right| }}{ck(k - 1)} d \mathopen {}\left( O\right) \\&\ge \frac{1}{2}\left[ f(O) - f\left( G_k\right) \right] + \frac{i{\left| C\right| }}{ck(k - 1)} d \mathopen {}\left( O\right) . \end{aligned} \end{aligned}$$

Let $u_i$ be the item added at the $i + 1$th step, $G_{i + 1} = \left\{ u_i\right\} \cup G_i.$ Then, since $g(u_i;\,G_i) \ge \alpha g(v;\,G_i)$ for any $v \in C,$

$$\begin{aligned} g\left( u_i;\, G_i\right) \ge \frac{\alpha }{2k}\left[ f(O) - f\left( G_k\right) \right] + \frac{i\alpha }{ck(k - 1)} d \mathopen {}\left( O\right) . \end{aligned}$$

Summing over i gives us

$$\begin{aligned} \frac{1}{2}f\left( G_k\right) + d\left( G_k\right) = \sum _{i = 0}^{k - 1}g\left( u_i;\, G_i\right) \ge \frac{\alpha }{2}\left[ f(O) - f\left( G_k\right) \right] + \frac{\alpha }{2c} d \mathopen {}\left( O\right) . \end{aligned}$$

Since $\alpha \le 1$ and $c \ge 1,$ we have

$$\begin{aligned} r\left( G_k\right) = f\left( G_k\right) + d\left( G_k\right) \ge \frac{\alpha }{2}f(O) + \frac{\alpha }{2c} d \mathopen {}\left( O\right) \ge \frac{\alpha }{2c} r \mathopen {}\left( O\right) , \end{aligned}$$

which completes the proof. $\square $

Proof of Proposition 4

To prove the proposition we need to first show that Modify does not decrease the gain of a set significantly.

Lemma 2

Assume a graph $G = (V,\,E).$ Assume a collection of k distinct subgraphs $\mathcal {W}$ of G, and let $U \in \mathcal {W}.$ Assume that $k < {\left| V\right| }$ and G contains more than k wedges, i.e., connected subgraphs of size 3. Let $M = \mathsf{{Modify}} (U,\, G,\, \mathcal {W},\, \lambda ).$ Then $ \chi \mathopen {}\left( V;\,\mathcal {W}\right) \ge 2/5 \times ( \chi \mathopen {}\left( U,\,\mathcal {W}\right) +\lambda ).$

Proof

Write $r = {\left| U\right| }$ and $\alpha = \frac{r}{r + 1}.$ We will split the proof in two cases. Case 1 assume that X, as given in Algorithm 3, is not empty. Select $B \in X.$ We will show that

$$\begin{aligned} \mathrm {dens} \mathopen {}\left( B\right) \ge \alpha \mathrm {dens} \mathopen {}\left( U\right) \quad \text {and}\quad D \mathopen {}\left( B,\,W\right) \ge \alpha ( D \mathopen {}\left( U,\, W\right) + I[U = W]), \end{aligned}$$

for any $W \in \mathcal {W},$ where $I[U = W] = 1$ if $U = W,$ and 0 otherwise. This automatically guarantees that

$$\begin{aligned} \chi \mathopen {}\left( B;\, \mathcal {W},\, \lambda \right) \ge \alpha ( \chi \mathopen {}\left( U;\, \mathcal {W},\, \lambda \right) + \lambda ), \end{aligned}$$

proving the result since $\alpha \ge 1/2$ and the gain of M is at least as good as the gain of B.

To prove the first inequality, note that

$$\begin{aligned} \mathrm {dens} \mathopen {}\left( B\right) = \frac{{\left| E(B)\right| }}{r + 1} \ge \frac{{\left| E(U)\right| }}{r + 1} = \alpha \frac{{\left| E(U)\right| }}{r} = \alpha \mathrm {dens} \mathopen {}\left( U\right) . \end{aligned}$$

To prove the second inequality fix $W \in \mathcal {W},$ and let $p = {\left| W\right| },\,q = {\left| W \cap U\right| }.$ Define

$$\begin{aligned} \varDelta = D \mathopen {}\left( U,\,W\right) + I[U = W] = 2 - \frac{q^2}{rp} = \frac{2rp - q^2}{rp}. \end{aligned}$$

Let v be the only vertex in $B \setminus U.$ If $v \notin W,$ then $ D \mathopen {}\left( B,\,W\right) \ge \varDelta .$ Hence, we can assume that $v \in W.$ This leads to

$$\begin{aligned} \begin{aligned} D \mathopen {}\left( B,\, W\right)&= 2 - \frac{{\left| B \cap W\right| }^2}{{\left| B\right| }{\left| W\right| }} \\&= 2 - \frac{ (1 + q)^2}{(1 + r)p} = \frac{2p(1 + r) - (1 + q)^2}{(1 + r)p}. \end{aligned} \end{aligned}$$

Let us define $\beta $ as the fraction of the numerators,

$$\begin{aligned} \beta = \frac{2p(1 + r) - (1 + q)^2}{2rp - q^2}. \end{aligned}$$

We wish to show that $\beta \ge 1.$ Since $p \ge q + 1,$

$$\begin{aligned} \begin{aligned} \beta&= \frac{2p(1 + r) - (1 + q)^2}{2rp - q^2} = \frac{2rp - q^2 + 2p - 2q - 1}{2rp - q^2} \\&\ge \frac{2rp - q^2 + 2(q + 1) - 2q - 1}{2rp - q^2} = \frac{2rp - q^2 + 1}{2rp - q^2} \ge 1. \end{aligned} \end{aligned}$$

The ratio of distances is now

$$\begin{aligned} \frac{ D \mathopen {}\left( B,\,W\right) }{\varDelta } = \beta \frac{r}{r + 1} \ge \frac{r}{r + 1} = \alpha . \end{aligned}$$

This proves the first case.

Case 2 assume that $X = \emptyset .$ Then we must have $Y \ne \emptyset $ and $r \ge 2,$ as otherwise ${\left| \mathcal {W}\right| } \ge {\left| V\right| },$ which violates the assumption of the lemma.

Assume that $ \mathrm {dens} \mathopen {}\left( U\right) \ge 5/3.$ Let $B \in Y.$ Removing a single item of U decreases the density by 1, at most. This gives us

$$\begin{aligned} \frac{ \mathrm {dens} \mathopen {}\left( B\right) }{ \mathrm {dens} \mathopen {}\left( U\right) } \ge \frac{ \mathrm {dens} \mathopen {}\left( U\right) - 1}{ \mathrm {dens} \mathopen {}\left( U\right) } \ge \frac{5/3 - 1}{5 / 3} = \frac{2}{5}. \end{aligned}$$

To bound the distance term, fix $W \in \mathcal {W},$ and let $p = {\left| W\right| },\,q = {\left| W \cap U\right| }.$ Let v be the only vertex in $U \setminus V.$ Define $\varDelta = D \mathopen {}\left( U,\,W\right) + I[U = W].$ If $v \in W,$ then we can easily show that $ D \mathopen {}\left( V,\,W\right) \ge \varDelta .$ Hence, assume that $v \notin W.$ This implies that $q \le \min p,\,r - 1,$ or $q^2 \le p(r - 1).$ As before, we can express the distance term as

$$\begin{aligned} \varDelta = 2 - \frac{q^2}{rp} = \frac{2rp - q^2}{rp}, \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} D \mathopen {}\left( B,\, W\right)&= 2 - \frac{{\left| B \cap W\right| }^2}{{\left| B\right| }{\left| W\right| }} = 2 - \frac{ q^2}{(r - 1)p} = \frac{2p(r - 1) - q^2}{(r - 1)p}. \end{aligned} \end{aligned}$$

The ratio is then

$$\begin{aligned} \begin{aligned} \frac{ D \mathopen {}\left( B,\, W\right) }{\varDelta }&= \frac{2p(r - 1) - q^2}{2rp - q^2}\frac{r}{r - 1} \\&\ge \frac{2p(r - 1) - p(r - 1)}{2rp - p(r - 1)}\frac{r}{r - 1} = \frac{p(r - 1)}{rp + p}\frac{r}{r - 1} = \frac{r}{r + 1} \ge 1/2, \end{aligned} \end{aligned}$$

where the first inequality follows from the fact that the ratio is decreasing as function of q.

Assume that $ \mathrm {dens} \mathopen {}\left( U\right) < 5/ 3.$ By assumption there is a wedge B outside $\mathcal {W}.$ Since $ \mathrm {dens} \mathopen {}\left( B\right) \ge 2/3,$ we have $ \mathrm {dens} \mathopen {}\left( B\right) / \mathrm {dens} \mathopen {}\left( U\right) \ge 2 / 5.$ The distance terms decrease by a factor of 1/2, since

$$\begin{aligned} D \mathopen {}\left( U,\, W\right) \le 2 = 2 \times 1 \le 2 D \mathopen {}\left( B,\, W\right) . \end{aligned}$$

Combining the inequalities proves that

$$\begin{aligned} \chi \mathopen {}\left( B;\, \mathcal {W},\, \lambda \right) \ge \frac{2}{5} \chi \mathopen {}\left( U;\, \mathcal {W},\, \lambda \right) , \end{aligned}$$

which proves the lemma. $\square $

Proof of Proposition 4

To prove the proposition, we will first form a new graph H, and show that the density of a subgraph in H is closely related to the gain. This then allows us to prove the statement.

Let us first construct the graph $H{\text {:}}$ given a vertex v let us define

$$\begin{aligned} s(v) = {-}\sum _{v \in W_j} \frac{2\lambda }{{\left| W_j\right| }}. \end{aligned}$$

Let $H = (V,\, E^{\prime },\, c)$ be a fully connected weighted graph with self-loops where the weight of an edge $c(v,\, w)$ is

$$\begin{aligned} c(v,\, w) = I[(v,\, w) \in E] - \sum _{j \mid v,\, w \in W_j} \frac{4 \lambda }{{\left| W_j\right| }}, \end{aligned}$$

for $v \ne w,$ and $c(v,\, v) = s(v).$

Next, we connect the gain of set of vertices U (w.r.t. G) with the weighted density of U in H. Given an arbitrary set of vertices U, we will write c(U) to mean the total weight of edges in H. Each $c(v,\, w),$ for $v \in w,$ participates in $\deg _{H}(v;\, U)$ and $\deg _{H}(w;\, U),$ and each $c(v,\, v) = s(v)$ participates (once) in $\deg _{H}(v;\, U).$ This leads to

$$\begin{aligned} 2c(U) = \sum _{v \in U} \deg _{H}(v;\, U) + s(v). \end{aligned}$$

We can express the (weighted) degree of a vertex in H as

$$\begin{aligned} \begin{aligned} \deg _{H}(v;\, U)&= s(v) + \sum _{\begin{array}{c} w \in U \\ w \ne v \end{array}} c(v,\, w) = \deg _G (v ;\, U) -\sum _{j \mid v \in W_j} \frac{2\lambda }{{\left| W_j\right| }} \\&\quad - \sum _{\begin{array}{c} w \in U \\ w \ne v \end{array}} \sum _{j \mid v,\, w \in W_j} \frac{4 \lambda }{{\left| W_j\right| }} \\&= \deg _G (v ;\, U) - \lambda \sum _{j \mid v \in W_j} \frac{4{\left| U \cap W_j\right| } - 2}{{\left| W_j\right| }}. \end{aligned} \end{aligned}$$

(3)

Write $k = {\left| \mathcal {W}\right| }.$ These equalities lead to the following identity,

$$\begin{aligned} \begin{aligned} \mathrm {dens} \mathopen {}\left( U;\,H\right) + 4\lambda k&= \frac{1}{{\left| U\right| }}c(U) + 4\lambda k \\&= 4\lambda k + \frac{1}{2{\left| U\right| }} \sum _{v \in U} \deg _{H}(v;\, U) + s(v) \\&= 4\lambda k + \mathrm {dens} \mathopen {}\left( U;\, G\right) - \frac{1}{2{\left| U\right| }} \sum _{v \in U} \lambda \sum _{j \mid v \in W_j} \frac{4{\left| U \cap W_j\right| }}{{\left| W_j\right| }} \\&= \mathrm {dens} \mathopen {}\left( U;\, G\right) - 2\lambda \sum _{j = 1}^k2 - \frac{{\left| U \cap W_j\right| }^2}{{\left| U\right| }{\left| W_j\right| }} \\&= 2 \chi \mathopen {}\left( U;\, \mathcal {W},\, \lambda \right) + \epsilon (U,\, \mathcal {W}), \\ \end{aligned} \end{aligned}$$

(4)

where $\epsilon (U,\, \mathcal {W})$ is a correction term, equal to 2$\lambda $ if $U \in \mathcal {W},$ and 0 otherwise.

Let O be the densest subgraph in H. Next we show that during the for-loop Peel finds a graph whose density close to $ \mathrm {dens} \mathopen {}\left( O;\, H\right) .$ Let o be the first vertex in O deleted by Peel. We must have

$$\begin{aligned} \deg _H(o;\, O) \ge \mathrm {dens} \mathopen {}\left( O;\, H\right) , \end{aligned}$$

as otherwise we can delete o from O and obtain a better solution. Let $R = V_i$ be the graph at the moment when o is about to be removed. Let us compare $\deg _H(o;\,O)$ and $\deg _H(o;\, R).$ We can lower-bound of the second term of the right-hand side in Eq. (3) by ${-}4k\lambda - s(v).$ Since $O \subseteq R,$ this gives us

$$\begin{aligned} \begin{aligned} \deg _H(o;\, O)&\le \deg _G(o;\, O) \le \deg _G(o;\, R) \\&\le \deg _H(o;\, R) + s(o) + 4k\lambda . \end{aligned} \end{aligned}$$

To upper-bound the first two terms, note that by definition of Peel, the vertex o has the smallest $\deg _H(o;\, R) + s(o)$ among all the vertices in R. Hence,

$$\begin{aligned} \deg _H(o;\, R) + s(o) \le \sum _{v \in R} \frac{\deg _H(v;\, R) + s(v)}{{\left| R\right| }} = 2\frac{c(R)}{{\left| R\right| }} = 2 \mathrm {dens} \mathopen {}\left( R;\, H\right) . \end{aligned}$$

To complete the proof, let $O^{\prime }$ be the graph outside $\mathcal {W},$ maximizing the gain. Due to Eq. (4), we have

$$\begin{aligned} \begin{aligned} 2 \chi \mathopen {}\left( O^{\prime };\, \mathcal {W},\, \lambda \right)&= \mathrm {dens} \mathopen {}\left( O^{\prime };\, H\right) + 4k\lambda \le \mathrm {dens} \mathopen {}\left( O;\, H\right) + 4k\lambda \\&\le 2 \mathrm {dens} \mathopen {}\left( R;\, H\right) + 8k\lambda = 2( \mathrm {dens} \mathopen {}\left( R;\, H\right) + 4k\lambda ) \\&= 4 \chi \mathopen {}\left( R;\, \mathcal {W},\, \lambda \right) + 2\epsilon (R;\, \mathcal {W}). \end{aligned} \end{aligned}$$

Let S be the set returned by Peel.

If $R \notin \mathcal {W},$ then $\epsilon (R;\, \mathcal {W}) = 0.$ Moreover, R is not modified, and is one of the graphs that is tested for gain. Consequently, $ \chi \mathopen {}\left( S;\, \mathcal {W}\right) \ge \chi \mathopen {}\left( R;\, \mathcal {W}\right) ,$ proving the statement.

If $R \in \mathcal {W},$ then it is modified by Modify to, say, $R^{\prime }.$ Lemma 2 implies that $5/2 \times \chi \mathopen {}\left( R^{\prime },\, \mathcal {W}\right) \ge \chi \mathopen {}\left( R;\, \mathcal {W}\right) + \epsilon (R;\, \mathcal {W}).$ Since, $ \chi \mathopen {}\left( S;\, \mathcal {W}\right) \ge \chi \mathopen {}\left( R^{\prime };\, \mathcal {W}\right) ,$ this completes the proof. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Galbrun, E., Gionis, A. & Tatti, N. Top-k overlapping densest subgraphs. Data Min Knowl Disc 30, 1134–1165 (2016). https://doi.org/10.1007/s10618-016-0464-z

Download citation

Received: 30 December 2015
Accepted: 05 May 2016
Published: 26 May 2016
Issue Date: September 2016
DOI: https://doi.org/10.1007/s10618-016-0464-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Top-k overlapping densest subgraphs

Abstract

Access this article

Similar content being viewed by others

Top-k overlapping densest subgraphs: approximation algorithms and computational complexity

Dense Subgraphs in Biological Networks

Discovering Hierarchical Subgraphs of K-Core-Truss

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix: Proof of Proposition 1

Lemma 1

Proof

Proof of Proposition 1

Proof of Proposition 4

Lemma 2

Proof

Proof of Proposition 4

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Top-k overlapping densest subgraphs

Abstract

Access this article

Similar content being viewed by others

Top-k overlapping densest subgraphs: approximation algorithms and computational complexity

Dense Subgraphs in Biological Networks

Discovering Hierarchical Subgraphs of K-Core-Truss

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix: Proof of Proposition 1

Lemma 1

Proof

Proof of Proposition 1

Proof of Proposition 4

Lemma 2

Proof

Proof of Proposition 4

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation