Skip to main content

Combinatorial Optimization Techniques for Network-Based Data Mining

  • Reference work entry
  • First Online:
Handbook of Combinatorial Optimization

Abstract

The matters of discussion are combinatorial optimization aspects, concepts, and applications arising in the broad area of network-based data mining. The approach of representing real-world datasets as large-scale networks (graphs) has become increasingly popular during recent years. The purpose of this chapter is to briefly review the graph-theoretic and combinatorial optimization concepts that are important in the context of data mining, as well as to discuss the interpretation of these concepts from mathematical modeling perspective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 3,400.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 549.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    These results are valid asymptotically almost surely (a.a.s.), which means that the probability that a given property takes place tends to 1 as the number of vertices N goes to infinity.

Recommended Reading

  1. J. Abello, P.M. Pardalos, M.G.C. Resende, On maximum clique problems in very large graphs, in External Memory Algorithms (American Mathematical Society, Providence, 1999), pp. 119–130

    Google Scholar 

  2. J. Abello, P.M. Pardalos, M.G.C. Resende (eds.), Handbook of Massive Data Sets (Kluwer, Dordrecht, 2002)

    MATH  Google Scholar 

  3. J. Abello, M.G.C. Resende, S. Sudarsky, Massive quasi-clique detection, in LATIN 2002: Theoretical Informatics. Lecture Notes in Computer Science (Springer, Berlin/New York, 2002), pp. 598–612

    Chapter  Google Scholar 

  4. W. Aiello, F. Chung, L. Lu, A random graph model for power law graphs. Exp. Math. 10, 53–66 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  5. W. Aiello, F. Chung, L. Lu, Random evolution in massive graphs, in Handbook on Massive Data Sets, ed. by J. Abello, P. Pardalos, M. Resende (Kluwer, Dordrecht, 2002)

    Google Scholar 

  6. T. Akutsu, S. Kuhara, O. Maruyama, Identification of gene regulatory networks by strategic gene disruptions and gene overexpressions, in Proceedings of the 9th Annual ACM-SIAM Symposium Discrete Algorithms (SODA 1998), San Francisco, CA, 1998, pp. 695–702

    Google Scholar 

  7. R.D. Alba, A graph-theoretic definition of a sociometric clique. J. Math. Sociol. 6, 113–26 (1973)

    Article  MathSciNet  Google Scholar 

  8. R. Albert, A.-L. Barabasi, Statistical mechanics of complex networks. Rev. Mod. Phys. 74, 47–97 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  9. R. Albert, H. Jeong, A.-L. Barabási, Diameter of the World-Wide Web. Nature 401, 130–131 (1999)

    Article  Google Scholar 

  10. D. Alderson, Catching the “Network Science” bug: insight and opportunity for the operations researcher. Oper. Res. 56, 1047–1065 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  11. S. Arora, S. Safra, Approximating clique is NP-complete, in Proceedings of the 33rd IEEE Symposium on Foundations on Computer Science, Pittsburg, PA, 24–27 Oct 1992, pp. 2–13

    Google Scholar 

  12. C.-A. Azencott, A. Ksikes, S.J. Swamidass, J.H. Chen, L. Ralaivola, P. Baldi, One- to four-dimensional kernels for virtual screening and the prediction of physical, chemical, and biological properties. J. Chem. Inf. Model. 47, 965–974 (2007)

    Article  Google Scholar 

  13. B. Balasundaram, S. Butenko, Network clustering, in Analysis of Biological Networks, ed. by B.H. Junker, F. Schreiber (Wiley, Hoboken, 2008), pp. 113–138

    Chapter  Google Scholar 

  14. B. Balasundaram, S. Butenko, S. Trukhanov, Novel approaches for analyzing biological networks. J. Comb. Optim. 10, 23–39 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  15. B. Balasundaram, S. Butenko, I. Hicks, Clique relaxations in social network analysis: the maximum k-plex problem. Oper. Res. 59(1), 133–142 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  16. A.-L. Barabasi, Linked (Perseus Publishing, New York, 2002)

    Google Scholar 

  17. A.-L. Barabasi, R. Albert, Emergence of scaling in random networks. Science 286, 509–511 (1999)

    Article  MathSciNet  Google Scholar 

  18. A.L. Barabasi, N. Gulbahce, J. Loscalzo, Network medicine: a network-based approach to human disease. Nat. Rev. Genet. 12(1), 56–68 (2011)

    Article  Google Scholar 

  19. J. Berg, M. Lassig, Local graph alignment and motif search in biological networks. Natl. Acad. Sci. USA 101(41), 14689–14694 (2004)

    Article  Google Scholar 

  20. V. Boginski, S. Butenko, P.M. Pardalos, Network-based techniques in the analysis of the stock market, in Supply Chain and Finance (World Scientific, Singapore, 2003), pp. 1–14

    Google Scholar 

  21. V. Boginski, S. Butenko, P.M. Pardalos, Modeling and optimization in massive graphs, in Novel Approaches to Hard Discrete Optimization, ed. by P.M. Pardalos, H. Wolkowicz (American Mathematical Society, Providence, 2003), pp. 17–39

    Google Scholar 

  22. V. Boginski, S. Butenko, P.M. Pardalos, On structural properties of the market graph, in Innovations in Financial and Economic Networks, ed. by A. Nagurney (Edward Elgar, Cheltenham/Northampton, 2003), pp. 29–45

    Google Scholar 

  23. V. Boginski, S. Butenko, P.M. Pardalos, Network models of massive datasets. Comput. Sci. Inf. Syst. 1, 79–93 (2004)

    Google Scholar 

  24. V. Boginski, S. Butenko, P.M. Pardalos, Statistical analysis of financial networks. Comput. Stat. Data Anal. 48(2), 431–443 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  25. V. Boginski, S. Butenko, P.M. Pardalos, Mining market data: a network approach. Comput. Oper. Res. 33, 3171–3184 (2006)

    Article  MATH  Google Scholar 

  26. I.M. Bomze, M. Budinich, P.M. Pardalos, M. Pelillo, The maximum clique problem, in Handbook of Combinatorial Optimization, ed. by D.-Z. Du, P.M. Pardalos (Kluwer, Dordrecht, 1999), pp. 1–74

    Chapter  Google Scholar 

  27. P.S. Bradley, U.M. Fayyad, O.L. Mangasarian, Mathematical programming for data mining: formulations and challenges. INFORMS J. Comput. 11(3), 217–238 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  28. S. Brin, L. Page, The anatomy of a large scale hypertextual web search engine, in Proceedings of the 7th World Wide Web Conference, Brisbane, Australia, 1998, pp. 107–117

    Google Scholar 

  29. A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, J. Wiener, Graph structure in the Web. Comput. Netw. 33, 309–320 (2000)

    Article  Google Scholar 

  30. S. Butenko, W.A. Chaovalitwongse, P.M. Pardalos, Clustering challenges in biological networks (World Scientific, New Jersry, 2009)

    Book  Google Scholar 

  31. F. Chung, L. Lu, Complex Graphs and Networks. CBMS Lecture Series (American Mathematical Society, Providence, 2006)

    MATH  Google Scholar 

  32. D.J. Cook, L.B. Holder, Graph-based data mining. IEEE Intell. Syst. 15(2), 32–41 (2000)

    Article  Google Scholar 

  33. T. Dandekar, S. Schuster, B. Snel, Pathway alignment: application to the comparative analysis of glycolytic enzymes. Biochem. J. 343, 115–124 (1999)

    Article  Google Scholar 

  34. J.C. Doyle, D.L. Alderson, L. Li, S. Low, M. Roughan, S. Shalunov, R. Tanaka, W. Willinger, The “robust yet fragile” nature of the internet. Proc. Natl. Acad. Sci. 102(41), 14497–14502 (2005)

    Article  Google Scholar 

  35. R. Durbin, S.R. Eddy, A. Krogh, G. Mitchison, Biological Sequence Analysis (Cambridge University Press, Cambridge, 1998)

    Book  MATH  Google Scholar 

  36. V.M. Eguiluz, D.R. Chialvo, G. Cecchi, M. Baliki, A.V. Apkarian, Scale-free structure of brain functional networks. Phys. Rev. Lett. 94, 018102 (2005)

    Article  Google Scholar 

  37. D.J. Felleman, D.C. Van Essen, Distributed hierarchical processing in the primate cerebral cortex. Cereb. Cortex 1, 1–47 (1991)

    Article  Google Scholar 

  38. T.A. Feo, M.G.C. Resende, A greedy randomized adaptive search procedure for maximum independent set. Oper. Res. 42, 860–878 (1994)

    Article  MATH  Google Scholar 

  39. T.A. Feo, M.G.C. Resende, Greedy randomized adaptive search procedures. J. Glob. Optim. 6, 109–133 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  40. P. Finn, S. Muggleton, D. Page, A. Srinivasan, Phannacophore discovery using the inductive logic programming system Progol. Mach. Learn. 30, 241–271 (1998)

    Article  Google Scholar 

  41. I. Fischer, T. Meinl, Graph based molecular data mining – an overview, in Proceedings of the 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE, Piscataway, 2004), pp. 4578–4582

    Google Scholar 

  42. I. Fischer, T. Meinl, Graph based molecular data mining – an overview, in IEEE International Conference on Systems, Man and Cybernetics, The Hague, The Netherlands, 2004

    Google Scholar 

  43. M.R. Garey, D.S. Johnson, The complexity of near-optimal coloring. J. ACM 23, 43–49 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  44. M.R. Garey, D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-completeness (Freeman, New York, 1979)

    MATH  Google Scholar 

  45. M. Girvan, M.E.J. Newman, Community structure in social and biological networks. Natl. Acad. Sci. 99, 7821–7826 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  46. J. Håstad, Clique is hard to approximate within n 1 − ε. Acta Math. 182, 105–142 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  47. B. Hayes, Graph theory in practice. Am. Sci. 88, 9–13 (Part I), 104–109 (Part II) (2000)

    Google Scholar 

  48. C.C. Hilgetag, R. Kotter, K.E. Stephen, O. Sporns, Computational methods for the analysis of brain connectivity, in Computational Neuroanatomy (Humana, Totowa, 2002)

    Google Scholar 

  49. R.A. Jarvis, E.A. Patrick, Clustering using a similarity measure based on shared nearest neighbors. IEEE Trans. Comput. C-22(11), 1025–1034 (1973)

    Article  Google Scholar 

  50. D. Jiang, J. Pei, Mining frequent cross-graph quasi-cliques. ACM Trans. Knowl. Discov. Data 2(4), 1–42 (2009)

    Article  MathSciNet  Google Scholar 

  51. D. Jiang, C. Tang, A. Zhang, Cluster analysis for gene expression data: a survey. 16(11), 1370–1386 (2004)

    Google Scholar 

  52. H. Kawaji, Y. Yamaguchi, H. Matsuda, A. Hashimoto, A graph based clustering method for a large set of sequences using a graph partitioning algorithm. Genome Inform. 12, 93–102 (2001)

    Google Scholar 

  53. H. Kawaji, Y. Takenaka, H. Matsuda, Graph-based clustering for finding distant relationships in a large set of protein sequences. Bioinformatics 20(2), 243–252 (2004)

    Article  Google Scholar 

  54. B.P. Kelley, R. Sharan, R.M. Karp, Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc. Natl. Acad. Sci. USA 100, 11394–11399 (2003)

    Article  Google Scholar 

  55. M.G. Kendall, Rank Correlation Methods (Griffin, Oxford, 1948)

    MATH  Google Scholar 

  56. J. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM 46, 604–632 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  57. E.V. Koonin, R.L. Tatusov, K.E. Rudd, Sequence similarity analysis of Escherichia coli proteins: functional and evolutionary implications. Proc. Natl Acad. Sci. USA 92, 11921–11925 (1995)

    Article  Google Scholar 

  58. L. Laloux, P. Cizeau, J.-P. Bouchad, M. Potters, Noise dressing of financial correlation matrices. Phys. Rev. Lett. 83(7), 1467–1470 (1999)

    Article  Google Scholar 

  59. R.D. Luce, Connectivity and generalized cliques in sociometric group structure. Psychometrika 15, 169–190 (1950)

    Article  MathSciNet  Google Scholar 

  60. R.N. Mantegna, H.E. Stanley, An Introduction to Econophysics: Correlations and Complexity in Finance (Cambridge University Press, Cambridge/New York, 2000)

    Google Scholar 

  61. H. Matsuda, T. Ishihara, A. Hashimoto, Classifying molecular sequences using a linkage graph with their pairwise similarities. Theor. Comput. Sci. 210, 305–325 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  62. N. Memon, J.J. Xu, L.L. Hicks, H. Chen, Data Mining for Social Network Data (Springer, New York/London, 2010)

    Book  Google Scholar 

  63. A. Mendelzon, P. Wood, Finding regular simple paths in graph databases. SIAM J. Comput. 24, 1235–1258 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  64. A. Mendelzon, G. Mihaila, T. Milo, Querying the World Wide Web. J. Digit. Libr. 1, 68–88 (1997)

    Google Scholar 

  65. Miniwatts Marketing Group, Internet growth statistics (2008), http://www.internetworldstats.com/emarketing.htm

  66. R.J. Mokken, Cliques, clubs and clans. Qual. Quant. 13, 161–173 (1979)

    Article  Google Scholar 

  67. J.M. Murre, D.P. Sturdy, The connectivity of the brain: multi-level quantitative analysis. Biol. Cybern. 73, 529–545 (1995)

    Article  MATH  Google Scholar 

  68. S. Nijssen, J.N. Kok, A quickstart in frequent structure mining can make a difference. LIACS, Leiden University, The Netherlands, Tech. Rep., April 2004

    Google Scholar 

  69. H. Ogata, W. Fujibuchi, S. Goto, A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters. Nucleic Acids Res. 28, 4021–4028 (2000)

    Article  Google Scholar 

  70. P.M. Pardalos, J. Xue, The maximum clique problem. J. Glob. Optim. 301–328 (1994)

    Google Scholar 

  71. P.M. Pardalos, T. Mavridou, J. Xue, The graph coloring problem: a bibliographic survey, in Handbook of Combinatorial Optimization, vol. 2, ed. by D.-Z. Du, P.M. Pardalos (Kluwer, Dodrecht, 1998), pp. 331–395

    Google Scholar 

  72. J. Pattillo, A. Veremyev, S. Butenko, V. Boginski, On the maximum quasi-clique problem. Discrete Appl. Math. 161(1–2), 244–257 (2013) doi: 10.1016/j.dam.2012.07.019, http://www.sciencedirect.com/science/article/pii/S0166218X12002843

    Google Scholar 

  73. J. Pei, D. Jiang, A. Zhang, Mining cross-graph quasi-cliques in gene expression and protein interaction data, in Proceedings of the 21st International Conference on Data Engineering, Tokyo, 2005, pp. 353–354

    Google Scholar 

  74. V. Plerou, P. Gopikrishnan, B. Rosenow, L.A.N. Amaral, H.E. Stanley, Universal and nonuniversal properties of cross correlations in financial time series. Phys. Rev. Lett. 83(7), 1471–1474 (1999)

    Article  Google Scholar 

  75. O.A. Prokopyev, V. Boginski, W. Chaovalitwongse, P.M. Pardalos, J.C. Sackellares, P.R. Carney, Network-based techniques in EEG data analysis and epileptic brain modeling, in Data Mining in Biomedicine, ed. by P.M. Pardalos, V. Boginski, A. Vazacopoulos (Springer, New York, 2007), pp. 559–573

    Chapter  Google Scholar 

  76. L. Ralaivolaa, J.S. Swamidassa, H. Saigoa, P. Baldi, Graph kernels for chemical informatics. Neural Netw. 18, 1093–1110 (2005)

    Article  Google Scholar 

  77. J. Scott, T. Ideker, R.M. Karp, R. Sharan, Efficient algorithms for detecting signaling pathways in protein interaction networks. J. Comput. Biol. 13, 133–144 (2006)

    Article  MathSciNet  Google Scholar 

  78. S.B. Seidman, B.L. Foster, A graph theoretic generalization of the clique concept. J. Math. Sociol. 6, 139–154 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  79. C. Spearman, The proof and measurement of association between two things. Am. J. Psychol. 15(1), 72–101 (1904)

    Article  Google Scholar 

  80. M. Steffen, A. Petti, J. Aach, Automated modelling of signal transduction networks. BMC Bioinform. 3, 34 (2002)

    Article  Google Scholar 

  81. P.-N. Tan, M. Steingach, V. Kumar, Introduction to Data Mining (Addison-Wesley, Boston, 2006)

    Google Scholar 

  82. A. Veremyev, V. Boginski, Identifying large robust network clusters via new compact formulations of maximum k-club problems. Eur. J. Obstet. Gyn. R. B. 218(2), 316–326 (2012)

    MathSciNet  MATH  Google Scholar 

  83. N. Wale, X. Ning, G. Karypis, Trends in chemical graph data mining, in Managing and Mining Graph Data (Springer, New York, 2010), pp. 581–606

    Chapter  Google Scholar 

  84. T. Washio, H. Motoda, State of the art of graph-based data mining. SIGKDD Explor. Newsl. 5(1), 59–68 (2003)

    Article  Google Scholar 

  85. W. Willinger, D. Alderson, J.C. Doyle, Mathematics and the internet: a source of enormous confusion and great potential. Not. Am. Math. Soc. 56(5), 286–299 (2009)

    MathSciNet  Google Scholar 

  86. Q. Yang, S.-H. Sze, Path matching and graph matching in biological networks. J. Comput. Biol. 14(1), 56–67 (2007)

    Article  MathSciNet  Google Scholar 

  87. S.-H. Yook, Z.N. Oltvai, A.-L. Barabasi, Functional and topological characterization of protein interaction networks. Proteomics 4, 928–942 (2004)

    Article  Google Scholar 

  88. Z. Zeng, J. Wang, L. Zhou, G. Karypis, Coherent closed quasi-clique discovery from large dense graph databases, in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, 2006, pp. 797–802

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oleg Shirokikh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this entry

Cite this entry

Shirokikh, O., Stozhkov, V., Boginski, V. (2013). Combinatorial Optimization Techniques for Network-Based Data Mining. In: Pardalos, P., Du, DZ., Graham, R. (eds) Handbook of Combinatorial Optimization. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7997-1_6

Download citation

Publish with us

Policies and ethics