Abstract
The matters of discussion are combinatorial optimization aspects, concepts, and applications arising in the broad area of network-based data mining. The approach of representing real-world datasets as large-scale networks (graphs) has become increasingly popular during recent years. The purpose of this chapter is to briefly review the graph-theoretic and combinatorial optimization concepts that are important in the context of data mining, as well as to discuss the interpretation of these concepts from mathematical modeling perspective.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
These results are valid asymptotically almost surely (a.a.s.), which means that the probability that a given property takes place tends to 1 as the number of vertices N goes to infinity.
Recommended Reading
J. Abello, P.M. Pardalos, M.G.C. Resende, On maximum clique problems in very large graphs, in External Memory Algorithms (American Mathematical Society, Providence, 1999), pp. 119–130
J. Abello, P.M. Pardalos, M.G.C. Resende (eds.), Handbook of Massive Data Sets (Kluwer, Dordrecht, 2002)
J. Abello, M.G.C. Resende, S. Sudarsky, Massive quasi-clique detection, in LATIN 2002: Theoretical Informatics. Lecture Notes in Computer Science (Springer, Berlin/New York, 2002), pp. 598–612
W. Aiello, F. Chung, L. Lu, A random graph model for power law graphs. Exp. Math. 10, 53–66 (2001)
W. Aiello, F. Chung, L. Lu, Random evolution in massive graphs, in Handbook on Massive Data Sets, ed. by J. Abello, P. Pardalos, M. Resende (Kluwer, Dordrecht, 2002)
T. Akutsu, S. Kuhara, O. Maruyama, Identification of gene regulatory networks by strategic gene disruptions and gene overexpressions, in Proceedings of the 9th Annual ACM-SIAM Symposium Discrete Algorithms (SODA 1998), San Francisco, CA, 1998, pp. 695–702
R.D. Alba, A graph-theoretic definition of a sociometric clique. J. Math. Sociol. 6, 113–26 (1973)
R. Albert, A.-L. Barabasi, Statistical mechanics of complex networks. Rev. Mod. Phys. 74, 47–97 (2002)
R. Albert, H. Jeong, A.-L. Barabási, Diameter of the World-Wide Web. Nature 401, 130–131 (1999)
D. Alderson, Catching the “Network Science” bug: insight and opportunity for the operations researcher. Oper. Res. 56, 1047–1065 (2008)
S. Arora, S. Safra, Approximating clique is NP-complete, in Proceedings of the 33rd IEEE Symposium on Foundations on Computer Science, Pittsburg, PA, 24–27 Oct 1992, pp. 2–13
C.-A. Azencott, A. Ksikes, S.J. Swamidass, J.H. Chen, L. Ralaivola, P. Baldi, One- to four-dimensional kernels for virtual screening and the prediction of physical, chemical, and biological properties. J. Chem. Inf. Model. 47, 965–974 (2007)
B. Balasundaram, S. Butenko, Network clustering, in Analysis of Biological Networks, ed. by B.H. Junker, F. Schreiber (Wiley, Hoboken, 2008), pp. 113–138
B. Balasundaram, S. Butenko, S. Trukhanov, Novel approaches for analyzing biological networks. J. Comb. Optim. 10, 23–39 (2005)
B. Balasundaram, S. Butenko, I. Hicks, Clique relaxations in social network analysis: the maximum k-plex problem. Oper. Res. 59(1), 133–142 (2011)
A.-L. Barabasi, Linked (Perseus Publishing, New York, 2002)
A.-L. Barabasi, R. Albert, Emergence of scaling in random networks. Science 286, 509–511 (1999)
A.L. Barabasi, N. Gulbahce, J. Loscalzo, Network medicine: a network-based approach to human disease. Nat. Rev. Genet. 12(1), 56–68 (2011)
J. Berg, M. Lassig, Local graph alignment and motif search in biological networks. Natl. Acad. Sci. USA 101(41), 14689–14694 (2004)
V. Boginski, S. Butenko, P.M. Pardalos, Network-based techniques in the analysis of the stock market, in Supply Chain and Finance (World Scientific, Singapore, 2003), pp. 1–14
V. Boginski, S. Butenko, P.M. Pardalos, Modeling and optimization in massive graphs, in Novel Approaches to Hard Discrete Optimization, ed. by P.M. Pardalos, H. Wolkowicz (American Mathematical Society, Providence, 2003), pp. 17–39
V. Boginski, S. Butenko, P.M. Pardalos, On structural properties of the market graph, in Innovations in Financial and Economic Networks, ed. by A. Nagurney (Edward Elgar, Cheltenham/Northampton, 2003), pp. 29–45
V. Boginski, S. Butenko, P.M. Pardalos, Network models of massive datasets. Comput. Sci. Inf. Syst. 1, 79–93 (2004)
V. Boginski, S. Butenko, P.M. Pardalos, Statistical analysis of financial networks. Comput. Stat. Data Anal. 48(2), 431–443 (2005)
V. Boginski, S. Butenko, P.M. Pardalos, Mining market data: a network approach. Comput. Oper. Res. 33, 3171–3184 (2006)
I.M. Bomze, M. Budinich, P.M. Pardalos, M. Pelillo, The maximum clique problem, in Handbook of Combinatorial Optimization, ed. by D.-Z. Du, P.M. Pardalos (Kluwer, Dordrecht, 1999), pp. 1–74
P.S. Bradley, U.M. Fayyad, O.L. Mangasarian, Mathematical programming for data mining: formulations and challenges. INFORMS J. Comput. 11(3), 217–238 (1999)
S. Brin, L. Page, The anatomy of a large scale hypertextual web search engine, in Proceedings of the 7th World Wide Web Conference, Brisbane, Australia, 1998, pp. 107–117
A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, J. Wiener, Graph structure in the Web. Comput. Netw. 33, 309–320 (2000)
S. Butenko, W.A. Chaovalitwongse, P.M. Pardalos, Clustering challenges in biological networks (World Scientific, New Jersry, 2009)
F. Chung, L. Lu, Complex Graphs and Networks. CBMS Lecture Series (American Mathematical Society, Providence, 2006)
D.J. Cook, L.B. Holder, Graph-based data mining. IEEE Intell. Syst. 15(2), 32–41 (2000)
T. Dandekar, S. Schuster, B. Snel, Pathway alignment: application to the comparative analysis of glycolytic enzymes. Biochem. J. 343, 115–124 (1999)
J.C. Doyle, D.L. Alderson, L. Li, S. Low, M. Roughan, S. Shalunov, R. Tanaka, W. Willinger, The “robust yet fragile” nature of the internet. Proc. Natl. Acad. Sci. 102(41), 14497–14502 (2005)
R. Durbin, S.R. Eddy, A. Krogh, G. Mitchison, Biological Sequence Analysis (Cambridge University Press, Cambridge, 1998)
V.M. Eguiluz, D.R. Chialvo, G. Cecchi, M. Baliki, A.V. Apkarian, Scale-free structure of brain functional networks. Phys. Rev. Lett. 94, 018102 (2005)
D.J. Felleman, D.C. Van Essen, Distributed hierarchical processing in the primate cerebral cortex. Cereb. Cortex 1, 1–47 (1991)
T.A. Feo, M.G.C. Resende, A greedy randomized adaptive search procedure for maximum independent set. Oper. Res. 42, 860–878 (1994)
T.A. Feo, M.G.C. Resende, Greedy randomized adaptive search procedures. J. Glob. Optim. 6, 109–133 (1995)
P. Finn, S. Muggleton, D. Page, A. Srinivasan, Phannacophore discovery using the inductive logic programming system Progol. Mach. Learn. 30, 241–271 (1998)
I. Fischer, T. Meinl, Graph based molecular data mining – an overview, in Proceedings of the 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE, Piscataway, 2004), pp. 4578–4582
I. Fischer, T. Meinl, Graph based molecular data mining – an overview, in IEEE International Conference on Systems, Man and Cybernetics, The Hague, The Netherlands, 2004
M.R. Garey, D.S. Johnson, The complexity of near-optimal coloring. J. ACM 23, 43–49 (1976)
M.R. Garey, D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-completeness (Freeman, New York, 1979)
M. Girvan, M.E.J. Newman, Community structure in social and biological networks. Natl. Acad. Sci. 99, 7821–7826 (2002)
J. Håstad, Clique is hard to approximate within n 1 − ε. Acta Math. 182, 105–142 (1999)
B. Hayes, Graph theory in practice. Am. Sci. 88, 9–13 (Part I), 104–109 (Part II) (2000)
C.C. Hilgetag, R. Kotter, K.E. Stephen, O. Sporns, Computational methods for the analysis of brain connectivity, in Computational Neuroanatomy (Humana, Totowa, 2002)
R.A. Jarvis, E.A. Patrick, Clustering using a similarity measure based on shared nearest neighbors. IEEE Trans. Comput. C-22(11), 1025–1034 (1973)
D. Jiang, J. Pei, Mining frequent cross-graph quasi-cliques. ACM Trans. Knowl. Discov. Data 2(4), 1–42 (2009)
D. Jiang, C. Tang, A. Zhang, Cluster analysis for gene expression data: a survey. 16(11), 1370–1386 (2004)
H. Kawaji, Y. Yamaguchi, H. Matsuda, A. Hashimoto, A graph based clustering method for a large set of sequences using a graph partitioning algorithm. Genome Inform. 12, 93–102 (2001)
H. Kawaji, Y. Takenaka, H. Matsuda, Graph-based clustering for finding distant relationships in a large set of protein sequences. Bioinformatics 20(2), 243–252 (2004)
B.P. Kelley, R. Sharan, R.M. Karp, Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc. Natl. Acad. Sci. USA 100, 11394–11399 (2003)
M.G. Kendall, Rank Correlation Methods (Griffin, Oxford, 1948)
J. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM 46, 604–632 (1999)
E.V. Koonin, R.L. Tatusov, K.E. Rudd, Sequence similarity analysis of Escherichia coli proteins: functional and evolutionary implications. Proc. Natl Acad. Sci. USA 92, 11921–11925 (1995)
L. Laloux, P. Cizeau, J.-P. Bouchad, M. Potters, Noise dressing of financial correlation matrices. Phys. Rev. Lett. 83(7), 1467–1470 (1999)
R.D. Luce, Connectivity and generalized cliques in sociometric group structure. Psychometrika 15, 169–190 (1950)
R.N. Mantegna, H.E. Stanley, An Introduction to Econophysics: Correlations and Complexity in Finance (Cambridge University Press, Cambridge/New York, 2000)
H. Matsuda, T. Ishihara, A. Hashimoto, Classifying molecular sequences using a linkage graph with their pairwise similarities. Theor. Comput. Sci. 210, 305–325 (1999)
N. Memon, J.J. Xu, L.L. Hicks, H. Chen, Data Mining for Social Network Data (Springer, New York/London, 2010)
A. Mendelzon, P. Wood, Finding regular simple paths in graph databases. SIAM J. Comput. 24, 1235–1258 (1995)
A. Mendelzon, G. Mihaila, T. Milo, Querying the World Wide Web. J. Digit. Libr. 1, 68–88 (1997)
Miniwatts Marketing Group, Internet growth statistics (2008), http://www.internetworldstats.com/emarketing.htm
R.J. Mokken, Cliques, clubs and clans. Qual. Quant. 13, 161–173 (1979)
J.M. Murre, D.P. Sturdy, The connectivity of the brain: multi-level quantitative analysis. Biol. Cybern. 73, 529–545 (1995)
S. Nijssen, J.N. Kok, A quickstart in frequent structure mining can make a difference. LIACS, Leiden University, The Netherlands, Tech. Rep., April 2004
H. Ogata, W. Fujibuchi, S. Goto, A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters. Nucleic Acids Res. 28, 4021–4028 (2000)
P.M. Pardalos, J. Xue, The maximum clique problem. J. Glob. Optim. 301–328 (1994)
P.M. Pardalos, T. Mavridou, J. Xue, The graph coloring problem: a bibliographic survey, in Handbook of Combinatorial Optimization, vol. 2, ed. by D.-Z. Du, P.M. Pardalos (Kluwer, Dodrecht, 1998), pp. 331–395
J. Pattillo, A. Veremyev, S. Butenko, V. Boginski, On the maximum quasi-clique problem. Discrete Appl. Math. 161(1–2), 244–257 (2013) doi: 10.1016/j.dam.2012.07.019, http://www.sciencedirect.com/science/article/pii/S0166218X12002843
J. Pei, D. Jiang, A. Zhang, Mining cross-graph quasi-cliques in gene expression and protein interaction data, in Proceedings of the 21st International Conference on Data Engineering, Tokyo, 2005, pp. 353–354
V. Plerou, P. Gopikrishnan, B. Rosenow, L.A.N. Amaral, H.E. Stanley, Universal and nonuniversal properties of cross correlations in financial time series. Phys. Rev. Lett. 83(7), 1471–1474 (1999)
O.A. Prokopyev, V. Boginski, W. Chaovalitwongse, P.M. Pardalos, J.C. Sackellares, P.R. Carney, Network-based techniques in EEG data analysis and epileptic brain modeling, in Data Mining in Biomedicine, ed. by P.M. Pardalos, V. Boginski, A. Vazacopoulos (Springer, New York, 2007), pp. 559–573
L. Ralaivolaa, J.S. Swamidassa, H. Saigoa, P. Baldi, Graph kernels for chemical informatics. Neural Netw. 18, 1093–1110 (2005)
J. Scott, T. Ideker, R.M. Karp, R. Sharan, Efficient algorithms for detecting signaling pathways in protein interaction networks. J. Comput. Biol. 13, 133–144 (2006)
S.B. Seidman, B.L. Foster, A graph theoretic generalization of the clique concept. J. Math. Sociol. 6, 139–154 (1978)
C. Spearman, The proof and measurement of association between two things. Am. J. Psychol. 15(1), 72–101 (1904)
M. Steffen, A. Petti, J. Aach, Automated modelling of signal transduction networks. BMC Bioinform. 3, 34 (2002)
P.-N. Tan, M. Steingach, V. Kumar, Introduction to Data Mining (Addison-Wesley, Boston, 2006)
A. Veremyev, V. Boginski, Identifying large robust network clusters via new compact formulations of maximum k-club problems. Eur. J. Obstet. Gyn. R. B. 218(2), 316–326 (2012)
N. Wale, X. Ning, G. Karypis, Trends in chemical graph data mining, in Managing and Mining Graph Data (Springer, New York, 2010), pp. 581–606
T. Washio, H. Motoda, State of the art of graph-based data mining. SIGKDD Explor. Newsl. 5(1), 59–68 (2003)
W. Willinger, D. Alderson, J.C. Doyle, Mathematics and the internet: a source of enormous confusion and great potential. Not. Am. Math. Soc. 56(5), 286–299 (2009)
Q. Yang, S.-H. Sze, Path matching and graph matching in biological networks. J. Comput. Biol. 14(1), 56–67 (2007)
S.-H. Yook, Z.N. Oltvai, A.-L. Barabasi, Functional and topological characterization of protein interaction networks. Proteomics 4, 928–942 (2004)
Z. Zeng, J. Wang, L. Zhou, G. Karypis, Coherent closed quasi-clique discovery from large dense graph databases, in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, 2006, pp. 797–802
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this entry
Cite this entry
Shirokikh, O., Stozhkov, V., Boginski, V. (2013). Combinatorial Optimization Techniques for Network-Based Data Mining. In: Pardalos, P., Du, DZ., Graham, R. (eds) Handbook of Combinatorial Optimization. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7997-1_6
Download citation
DOI: https://doi.org/10.1007/978-1-4419-7997-1_6
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-7996-4
Online ISBN: 978-1-4419-7997-1
eBook Packages: Mathematics and StatisticsReference Module Computer Science and Engineering