Skip to main content

Mining Uncertain Graphs: An Overview

  • Conference paper
  • First Online:
Algorithmic Aspects of Cloud Computing (ALGOCLOUD 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10230))

Included in the following conference series:

Abstract

Graphs play an important role in modern world, due to their widespread use for modeling, representing and organizing linked data. Taking into consideration that most of the “killer” applications require a graph-based representation (e.g., the Web, social network management, protein-protein interaction networks), efficient query processing and analysis techniques are required, not only because these graphs are massive but also because the operations that must be supported are complex, requiring significant computational resources. In many cases, each graph edge e is annotated by a probability value p(e), expressing its existential uncertainty. This means that with probability p(e) the edge will be present in the graph and with probability \(1-p(e)\) the edge will be absent. This gives rise to the concept of probabilistic graphs (also known as uncertain graphs). Formally, a probabilistic graph \(\mathcal{G}\) is a triplet (V, E, p) where V is the set of nodes, E is the set of edges and \(p: E \rightarrow (0,1]\). The main challenge posed by this formulation is that problems that are relatively easy to solve in exact graphs become very difficult (or even intractable) in probabilistic graphs. In this paper, we perform an overview of the algorithmic techniques proposed in the literature for uncertain graph analysis. In particular, we center our focus on the following graph mining tasks: clustering, maximal cliques, k-nearest neighbors and core decomposition. We conclude the paper with a short discussion related to distributed mining of uncertain graphs which is expected to achieve significant performance improvements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Although existential probabilities can be assigned to the vertices of the graph as well, in this paper we focus on edge probabilities only.

References

  1. Aggarwal, C.C., Wang, H.: Managing and Mining Graph Data. Springer, Heidelberg (2010)

    Book  MATH  Google Scholar 

  2. Ailon, N., Charikar, M., Newman, A.: Aggregating inconsistent information: ranking and clustering. J. ACM (JACM) 55(5), 23 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  3. Andersen, R., Chellapilla, K.: Finding dense subgraphs with size bounds. In: Avrachenkov, K., Donato, D., Litvak, N. (eds.) WAW 2009. LNCS, vol. 5427, pp. 25–37. Springer, Heidelberg (2009). doi:10.1007/978-3-540-95995-3_3

    Chapter  Google Scholar 

  4. Bader, G.D., Hogue, C.W.: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform. 4(1), 2 (2003)

    Article  Google Scholar 

  5. Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Mach. Learn. 56(1–3), 89–113 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  6. Biswas, S., Morris, R.: Exor: opportunistic multi-hop routing for wireless networks. ACM SIGCOMM Comput. Commun. Rev. 35(4), 133–144 (2005)

    Article  Google Scholar 

  7. Bonchi, F., Gullo, F., Kaltenbrunner, A., Volkovich, Y.: Core decomposition of uncertain graphs. In: KDD, pp. 1316–1325 (2014)

    Google Scholar 

  8. Brandes, U., Gaertler, M., Wagner, D.: Engineering graph clustering: models and experimental evaluation. ACM J. Exp. Algorithmics 12(1.1), 1–26 (2007)

    MathSciNet  MATH  Google Scholar 

  9. Cheng, Y., Yuan, Y., Chen, L., Wang, G., Giraud-Carrier, C., Sun, Y.: Distr: a distributed method for the reachability query over large uncertain graphs. IEEE Trans. Parallel Distrib. Syst. 27(11), 3172–3185 (2016)

    Article  Google Scholar 

  10. Colbourn, C.J., Colbourn, C.: The Combinatorics of Network Reliability, vol. 200. Oxford University Press, New York (1987)

    MATH  Google Scholar 

  11. Cook, D.J., Holder, L.B.: Mining Graph Data. Wiley, Hoboken (2006)

    Book  MATH  Google Scholar 

  12. Eppstein, D., Löffler, M., Strash, D.: Listing all maximal cliques in sparse graphs in near-optimal time. In: Cheong, O., Chwa, K.-Y., Park, K. (eds.) ISAAC 2010. LNCS, vol. 6506, pp. 403–414. Springer, Heidelberg (2010). doi:10.1007/978-3-642-17517-6_36

    Chapter  Google Scholar 

  13. Feo, T.A., Resende, M.G.: A probabilistic heuristic for a computationally difficult set covering problem. Oper. Res. Lett. 8(2), 67–71 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  14. Fogaras, D., Rácz, B.: Towards scaling fully personalized pagerank. In: Leonardi, S. (ed.) WAW 2004. LNCS, vol. 3243, pp. 105–117. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30216-2_9

    Chapter  Google Scholar 

  15. Fortunato, S.: Community detection in graphs. Phys. Rep. 483(3), 75–174 (2010)

    Article  MathSciNet  Google Scholar 

  16. Friden, C., Hertz, A., de Werra, D.: Stabulus: a technique for finding stable sets in large graphs with tabu search. Computing 42(1), 35–44 (1989)

    Article  MATH  Google Scholar 

  17. Gavin, A.-C., Bösche, M., Krause, R., Grandi, P., Marzioch, M., Bauer, A., Schultz, J., Rick, J.M., Michon, A.-M., Cruciat, C.-M., et al.: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415(6868), 141–147 (2002)

    Article  Google Scholar 

  18. Ghosh, J., Ngo, H.Q., Yoon, S., Qiao, C.: On a routing problem within probabilistic graphs and its application to intermittently connected networks. In: 26th IEEE International Conference on Computer Communications, INFOCOM 2007, pp. 1721–1729. IEEE (2007)

    Google Scholar 

  19. Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. ACM Trans. Knowl. Discov. Data (TKDD) 1(1), 4 (2007)

    Article  Google Scholar 

  20. Glover, F.: Tabu search–part II. ORSA J. Comput. 2(1), 4–32 (1990)

    Article  MATH  Google Scholar 

  21. Goyal, A., Lu, W., Lakshmanan, L.V.: CELF++: optimizing the greedy algorithm for influence maximization in social networks. In: Proceedings of the 20th International Conference Companion on World Wide Web, pp. 47–48. ACM (2011)

    Google Scholar 

  22. Harley, E., Bonner, A., Goodman, N.: Uniform integration of genome mapping data using intersection graphs. Bioinformatics 17(6), 487–494 (2001)

    Article  Google Scholar 

  23. Huang, X., Cheng, H., Yu, J.X.: Attributed community analysis: global and ego-centric views. Data Eng. 14, 29 (2016)

    Google Scholar 

  24. Huang, X., Lu, W., Lakshmanan, L.V.: Truss decomposition of probabilistic graphs: semantics and algorithms. In: SIGMOD, pp. 77–90 (2016)

    Google Scholar 

  25. Jin, R., Liu, L., Aggarwal, C., Shen, Y.: Reliable clustering on uncertain graphs. In: ICDM, pp. 459–468 (2012)

    Google Scholar 

  26. Karypis, G., Kumar, V.: Parallel multilevel k-way partitioning scheme for irregular graphs. In: Proceedings of the 1996 ACM/IEEE Conference on Supercomputing, Supercomputing 1996, Washington, DC, USA. IEEE Computer Society (1996)

    Google Scholar 

  27. Khan, A., Bonchi, F., Gionis, A., Gullo, F.: Fast reliability search in uncertain graphs. In: EDBT, pp. 535–546 (2014)

    Google Scholar 

  28. Kollios, G., Potamias, M., Terzi, E.: Clustering large probabilistic graphs. IEEE Trans. Knowl. Data Eng. 25(2), 325–336 (2013)

    Article  Google Scholar 

  29. Kortsarz, G., Peleg, D.: Generating sparse 2-spanners. J. Algorithms 17(2), 222–236 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  30. Kovács, F., Legány, C., Babos, A.: Cluster validity measurement techniques. In: 6th International Symposium of Hungarian Researchers on Computational Intelligence. Citeseer (2005)

    Google Scholar 

  31. Krogan, N.J., Cagney, G., Yu, H., Zhong, G., Guo, X., Ignatchenko, A., Li, J., Pu, S., Datta, N., Tikuisis, A.P., et al.: Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440(7084), 637–643 (2006)

    Article  Google Scholar 

  32. LaSalle, D., Patwary, M.M.A., Satish, N., Sundaram, N., Dubey, P., Karypis, G.: Improving graph partitioning for modern graphs and architectures. In: Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms, IA3 2015, pp. 14:1–14:4. ACM, New York (2015)

    Google Scholar 

  33. Liu, L., Jin, R., Aggarwal, C., Shen, Y.: Reliable clustering on uncertain graphs. In: 2012 IEEE 12th International Conference on Data Mining (ICDM), pp. 459–468. IEEE (2012)

    Google Scholar 

  34. Mcauley, J., Leskovec, J.: Discovering social circles in ego networks. ACM Trans. Knowl. Discov. Data (TKDD) 8(1), 4 (2014)

    Google Scholar 

  35. Mewes, H.-W., Amid, C., Arnold, R., Frishman, D., Güldener, U., Mannhaupt, G., Münsterkötter, M., Pagel, P., Strack, N., Stümpflen, V., et al.: MIPS: analysis and annotation of proteins from whole genomes. Nucleic Acids Res. 32(suppl 1), D41–D44 (2004)

    Article  Google Scholar 

  36. Mukherjee, A., Xu, P., Tirthapura, S.: Enumeration of maximal cliques from an uncertain graph. IEEE Trans. Knowl. Data Eng. 29, 543–555 (2016)

    Article  Google Scholar 

  37. Mukherjee, A.P., Xu, P., Tirthapura, S.: Mining maximal cliques from an uncertain graph. In: 2015 IEEE 31st International Conference on Data Engineering (ICDE), pp. 243–254. IEEE (2015)

    Google Scholar 

  38. Newman, M.E.: Modularity and community structure in networks. Proc. Nat. Acad. Sci. 103(23), 8577–8582 (2006)

    Article  Google Scholar 

  39. Palla, G., Derényi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. Nature 435(7043), 814–818 (2005)

    Article  Google Scholar 

  40. Papapetrou, O., Ioannou, E., Skoutas, D.: Efficient discovery of frequent subgraph patterns in uncertain graph databases. In: Proceedings of EDBT, pp. 355–366 (2011)

    Google Scholar 

  41. Parchas, P., Gullo, F., Papadias, D., Bonchi, F.: The pursuit of a good possible world: extracting representative instances of uncertain graphs. In: SIGMOD, pp. 967–978 (2014)

    Google Scholar 

  42. Parchas, P., Gullo, F., Papadias, D., Bonchi, F.: Uncertain graph processing through representative instances. ACM Trans. Database Syst. 40(3), 20:1–20:39 (2015)

    Article  MathSciNet  Google Scholar 

  43. Pathak, N., Mane, S., Srivastava, J.: Who thinks who knows who? Socio-cognitive analysis of email networks. In: Sixth International Conference on Data Mining, ICDM 2006, pp. 466–477. IEEE (2006)

    Google Scholar 

  44. Pattillo, J., Youssef, N., Butenko, S.: Clique relaxation models in social network analysis. In: Thai, M.T., Pardalos, P.M. (eds.) Handbook of Optimization in Complex Networks. Springer Optimization and Its Applications, vol. 58, pp. 143–162. Springer, New York (2012)

    Chapter  Google Scholar 

  45. Pfeiffer, J., Neville, J.: Methods to determine node centrality and clustering in graphs with uncertain structure. In: ICWSM (2011)

    Google Scholar 

  46. Potamias, M., Bonchi, F., Gionis, A., Kollios, G.: K-nearest neighbors in uncertain graphs. Proc. VLDB Endow. 3, 997–1008 (2010)

    Article  Google Scholar 

  47. Rokhlenko, O., Wexler, Y., Yakhini, Z.: Similarities and differences of gene expression in yeast stress conditions. Bioinformatics 23(2), e184–e190 (2007)

    Article  Google Scholar 

  48. Rysz, M., Mirghorbani, M., Krokhmal, P., Pasiliao, E.L.: On risk-averse maximum weighted subgraph problems. J. Comb. Optim. 28(1), 167–185 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  49. Sarkar, P., Moore, A.W., Prakash, A.: Fast incremental proximity search in large graphs. In: Proceedings of the 25th International Conference on Machine Learning, pp. 896–903. ACM (2008)

    Google Scholar 

  50. Seidman, S.B.: Network structure and minimum degree. Soci. Netw. 5(3), 269–287 (1983)

    Article  MathSciNet  Google Scholar 

  51. Sevon, P., Eronen, L., Hintsanen, P., Kulovesi, K., Toivonen, H.: Link discovery in graphs derived from biological databases. In: Leser, U., Naumann, F., Eckman, B. (eds.) DILS 2006. LNCS, vol. 4075, pp. 35–49. Springer, Heidelberg (2006). doi:10.1007/11799511_5

    Chapter  Google Scholar 

  52. Shamir, R., Sharan, R., Tsur, D.: Cluster graph modification problems. Discrete Appl. Math. 144(1), 173–182 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  53. Tangwongsan, K., Pavan, A., Tirthapura, S.: Parallel triangle counting in massive streaming graphs. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, CIKM 2013, New York, NY, USA, pp. 781–786. ACM (2013)

    Google Scholar 

  54. Tian, Y., Hankins, R.A., Patel, J.M.: Efficient aggregation for graph summarization. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, New York, NY, USA, pp. 567–580. ACM (2008)

    Google Scholar 

  55. Tsourakakis, C., Gkantsidis, C., Radunovic, B., Vojnovic, M.: Fennel: streaming graph partitioning for massive scale graphs. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, WSDM 2014, New York, NY, USA, pp. 333–342. ACM (2014)

    Google Scholar 

  56. Tsourakakis, C.E.: A novel approach to finding near-cliques: the triangle-densest subgraph problem. CoRR abs/1405.1477 (2014)

    Google Scholar 

  57. Valiant, L.G.: The complexity of enumeration and reliability problems. SIAM J. Comput. 8(3), 410–421 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  58. Wu, Y., Yang, Y., Jiang, F., Jin, S., Xu, J.: Coritivity-based influence maximization in social networks. Phys. A Stat. Mech. Appl. 416, 467–480 (2014)

    Article  Google Scholar 

  59. Yezerska, O., Butenko, S., Boginski, V.L.: Detecting robust cliques in graphs subject to uncertain edge failures. Ann. Oper. Res. 238, 1–24 (2016)

    Article  MathSciNet  Google Scholar 

  60. Yuan, Y., Wang, G., Chen, L., Wang, H.: Efficient subgraph similarity search on large probabilistic graph databases. Proc. VLDB Endow. 5, 800–811 (2012)

    Article  Google Scholar 

  61. Zhang, B., Park, B.-H., Karpinets, T., Samatova, N.F.: From pull-down data to protein interaction networks and complexes with biological relevance. Bioinformatics 24(7), 979–986 (2008)

    Article  Google Scholar 

  62. Zou, Z.: Polynomial-time algorithm for finding densest subgraphs in uncertain graphs. In: Proceedings of MLG Workshop (2013)

    Google Scholar 

  63. Zou, Z., Li, J., Gao, H., Zhang, S.: Finding top-k maximal cliques in an uncertain graph. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE), pp. 649–652. IEEE (2010)

    Google Scholar 

  64. Zou, Z., Li, J., Gao, H., Zhang, S.: Mining frequent subgraph patterns from uncertain graph data. IEEE Trans. Knowl. Data Eng. 22(9), 1203–1218 (2010)

    Article  Google Scholar 

  65. Zou, Z., Zhu, R.: Truss decomposition of uncertain graphs. Knowl. Inf. Syst. 50, 197–230 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Apostolos N. Papadopoulos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Kassiano, V., Gounaris, A., Papadopoulos, A.N., Tsichlas, K. (2017). Mining Uncertain Graphs: An Overview. In: Sellis, T., Oikonomou, K. (eds) Algorithmic Aspects of Cloud Computing. ALGOCLOUD 2016. Lecture Notes in Computer Science(), vol 10230. Springer, Cham. https://doi.org/10.1007/978-3-319-57045-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-57045-7_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-57044-0

  • Online ISBN: 978-3-319-57045-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics