Combinatorial Optimization Techniques for Network-Based Data Mining

Shirokikh, Oleg; Stozhkov, Vladimir; Boginski, Vladimir

doi:10.1007/978-1-4419-7997-1_6

Oleg Shirokikh⁴,
Vladimir Stozhkov⁵ &
Vladimir Boginski⁶

7378 Accesses
1 Citations

Abstract

The matters of discussion are combinatorial optimization aspects, concepts, and applications arising in the broad area of network-based data mining. The approach of representing real-world datasets as large-scale networks (graphs) has become increasingly popular during recent years. The purpose of this chapter is to briefly review the graph-theoretic and combinatorial optimization concepts that are important in the context of data mining, as well as to discuss the interpretation of these concepts from mathematical modeling perspective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 3,400.00; Price excludes VAT (USA)

Hardcover Book: USD 549.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
These results are valid asymptotically almost surely (a.a.s.), which means that the probability that a given property takes place tends to 1 as the number of vertices N goes to infinity.

Recommended Reading

J. Abello, P.M. Pardalos, M.G.C. Resende, On maximum clique problems in very large graphs, in External Memory Algorithms (American Mathematical Society, Providence, 1999), pp. 119–130
Google Scholar
J. Abello, P.M. Pardalos, M.G.C. Resende (eds.), Handbook of Massive Data Sets (Kluwer, Dordrecht, 2002)
MATH Google Scholar
J. Abello, M.G.C. Resende, S. Sudarsky, Massive quasi-clique detection, in LATIN 2002: Theoretical Informatics. Lecture Notes in Computer Science (Springer, Berlin/New York, 2002), pp. 598–612
Chapter Google Scholar
W. Aiello, F. Chung, L. Lu, A random graph model for power law graphs. Exp. Math. 10, 53–66 (2001)
Article MathSciNet MATH Google Scholar
W. Aiello, F. Chung, L. Lu, Random evolution in massive graphs, in Handbook on Massive Data Sets, ed. by J. Abello, P. Pardalos, M. Resende (Kluwer, Dordrecht, 2002)
Google Scholar
T. Akutsu, S. Kuhara, O. Maruyama, Identification of gene regulatory networks by strategic gene disruptions and gene overexpressions, in Proceedings of the 9th Annual ACM-SIAM Symposium Discrete Algorithms (SODA 1998), San Francisco, CA, 1998, pp. 695–702
Google Scholar
R.D. Alba, A graph-theoretic definition of a sociometric clique. J. Math. Sociol. 6, 113–26 (1973)
Article MathSciNet Google Scholar
R. Albert, A.-L. Barabasi, Statistical mechanics of complex networks. Rev. Mod. Phys. 74, 47–97 (2002)
Article MathSciNet MATH Google Scholar
R. Albert, H. Jeong, A.-L. Barabási, Diameter of the World-Wide Web. Nature 401, 130–131 (1999)
Article Google Scholar
D. Alderson, Catching the “Network Science” bug: insight and opportunity for the operations researcher. Oper. Res. 56, 1047–1065 (2008)
Article MathSciNet MATH Google Scholar
S. Arora, S. Safra, Approximating clique is NP-complete, in Proceedings of the 33rd IEEE Symposium on Foundations on Computer Science, Pittsburg, PA, 24–27 Oct 1992, pp. 2–13
Google Scholar
C.-A. Azencott, A. Ksikes, S.J. Swamidass, J.H. Chen, L. Ralaivola, P. Baldi, One- to four-dimensional kernels for virtual screening and the prediction of physical, chemical, and biological properties. J. Chem. Inf. Model. 47, 965–974 (2007)
Article Google Scholar
B. Balasundaram, S. Butenko, Network clustering, in Analysis of Biological Networks, ed. by B.H. Junker, F. Schreiber (Wiley, Hoboken, 2008), pp. 113–138
Chapter Google Scholar
B. Balasundaram, S. Butenko, S. Trukhanov, Novel approaches for analyzing biological networks. J. Comb. Optim. 10, 23–39 (2005)
Article MathSciNet MATH Google Scholar
B. Balasundaram, S. Butenko, I. Hicks, Clique relaxations in social network analysis: the maximum k-plex problem. Oper. Res. 59(1), 133–142 (2011)
Article MathSciNet MATH Google Scholar
A.-L. Barabasi, Linked (Perseus Publishing, New York, 2002)
Google Scholar
A.-L. Barabasi, R. Albert, Emergence of scaling in random networks. Science 286, 509–511 (1999)
Article MathSciNet Google Scholar
A.L. Barabasi, N. Gulbahce, J. Loscalzo, Network medicine: a network-based approach to human disease. Nat. Rev. Genet. 12(1), 56–68 (2011)
Article Google Scholar
J. Berg, M. Lassig, Local graph alignment and motif search in biological networks. Natl. Acad. Sci. USA 101(41), 14689–14694 (2004)
Article Google Scholar
V. Boginski, S. Butenko, P.M. Pardalos, Network-based techniques in the analysis of the stock market, in Supply Chain and Finance (World Scientific, Singapore, 2003), pp. 1–14
Google Scholar
V. Boginski, S. Butenko, P.M. Pardalos, Modeling and optimization in massive graphs, in Novel Approaches to Hard Discrete Optimization, ed. by P.M. Pardalos, H. Wolkowicz (American Mathematical Society, Providence, 2003), pp. 17–39
Google Scholar
V. Boginski, S. Butenko, P.M. Pardalos, On structural properties of the market graph, in Innovations in Financial and Economic Networks, ed. by A. Nagurney (Edward Elgar, Cheltenham/Northampton, 2003), pp. 29–45
Google Scholar
V. Boginski, S. Butenko, P.M. Pardalos, Network models of massive datasets. Comput. Sci. Inf. Syst. 1, 79–93 (2004)
Google Scholar
V. Boginski, S. Butenko, P.M. Pardalos, Statistical analysis of financial networks. Comput. Stat. Data Anal. 48(2), 431–443 (2005)
Article MathSciNet MATH Google Scholar
V. Boginski, S. Butenko, P.M. Pardalos, Mining market data: a network approach. Comput. Oper. Res. 33, 3171–3184 (2006)
Article MATH Google Scholar
I.M. Bomze, M. Budinich, P.M. Pardalos, M. Pelillo, The maximum clique problem, in Handbook of Combinatorial Optimization, ed. by D.-Z. Du, P.M. Pardalos (Kluwer, Dordrecht, 1999), pp. 1–74
Chapter Google Scholar
P.S. Bradley, U.M. Fayyad, O.L. Mangasarian, Mathematical programming for data mining: formulations and challenges. INFORMS J. Comput. 11(3), 217–238 (1999)
Article MathSciNet MATH Google Scholar
S. Brin, L. Page, The anatomy of a large scale hypertextual web search engine, in Proceedings of the 7th World Wide Web Conference, Brisbane, Australia, 1998, pp. 107–117
Google Scholar
A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, J. Wiener, Graph structure in the Web. Comput. Netw. 33, 309–320 (2000)
Article Google Scholar
S. Butenko, W.A. Chaovalitwongse, P.M. Pardalos, Clustering challenges in biological networks (World Scientific, New Jersry, 2009)
Book Google Scholar
F. Chung, L. Lu, Complex Graphs and Networks. CBMS Lecture Series (American Mathematical Society, Providence, 2006)
MATH Google Scholar
D.J. Cook, L.B. Holder, Graph-based data mining. IEEE Intell. Syst. 15(2), 32–41 (2000)
Article Google Scholar
T. Dandekar, S. Schuster, B. Snel, Pathway alignment: application to the comparative analysis of glycolytic enzymes. Biochem. J. 343, 115–124 (1999)
Article Google Scholar
J.C. Doyle, D.L. Alderson, L. Li, S. Low, M. Roughan, S. Shalunov, R. Tanaka, W. Willinger, The “robust yet fragile” nature of the internet. Proc. Natl. Acad. Sci. 102(41), 14497–14502 (2005)
Article Google Scholar
R. Durbin, S.R. Eddy, A. Krogh, G. Mitchison, Biological Sequence Analysis (Cambridge University Press, Cambridge, 1998)
Book MATH Google Scholar
V.M. Eguiluz, D.R. Chialvo, G. Cecchi, M. Baliki, A.V. Apkarian, Scale-free structure of brain functional networks. Phys. Rev. Lett. 94, 018102 (2005)
Article Google Scholar
D.J. Felleman, D.C. Van Essen, Distributed hierarchical processing in the primate cerebral cortex. Cereb. Cortex 1, 1–47 (1991)
Article Google Scholar
T.A. Feo, M.G.C. Resende, A greedy randomized adaptive search procedure for maximum independent set. Oper. Res. 42, 860–878 (1994)
Article MATH Google Scholar
T.A. Feo, M.G.C. Resende, Greedy randomized adaptive search procedures. J. Glob. Optim. 6, 109–133 (1995)
Article MathSciNet MATH Google Scholar
P. Finn, S. Muggleton, D. Page, A. Srinivasan, Phannacophore discovery using the inductive logic programming system Progol. Mach. Learn. 30, 241–271 (1998)
Article Google Scholar
I. Fischer, T. Meinl, Graph based molecular data mining – an overview, in Proceedings of the 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE, Piscataway, 2004), pp. 4578–4582
Google Scholar
I. Fischer, T. Meinl, Graph based molecular data mining – an overview, in IEEE International Conference on Systems, Man and Cybernetics, The Hague, The Netherlands, 2004
Google Scholar
M.R. Garey, D.S. Johnson, The complexity of near-optimal coloring. J. ACM 23, 43–49 (1976)
Article MathSciNet MATH Google Scholar
M.R. Garey, D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-completeness (Freeman, New York, 1979)
MATH Google Scholar
M. Girvan, M.E.J. Newman, Community structure in social and biological networks. Natl. Acad. Sci. 99, 7821–7826 (2002)
Article MathSciNet MATH Google Scholar
J. Håstad, Clique is hard to approximate within n ^{1 − ε}. Acta Math. 182, 105–142 (1999)
Article MathSciNet MATH Google Scholar
B. Hayes, Graph theory in practice. Am. Sci. 88, 9–13 (Part I), 104–109 (Part II) (2000)
Google Scholar
C.C. Hilgetag, R. Kotter, K.E. Stephen, O. Sporns, Computational methods for the analysis of brain connectivity, in Computational Neuroanatomy (Humana, Totowa, 2002)
Google Scholar
R.A. Jarvis, E.A. Patrick, Clustering using a similarity measure based on shared nearest neighbors. IEEE Trans. Comput. C-22(11), 1025–1034 (1973)
Article Google Scholar
D. Jiang, J. Pei, Mining frequent cross-graph quasi-cliques. ACM Trans. Knowl. Discov. Data 2(4), 1–42 (2009)
Article MathSciNet Google Scholar
D. Jiang, C. Tang, A. Zhang, Cluster analysis for gene expression data: a survey. 16(11), 1370–1386 (2004)
Google Scholar
H. Kawaji, Y. Yamaguchi, H. Matsuda, A. Hashimoto, A graph based clustering method for a large set of sequences using a graph partitioning algorithm. Genome Inform. 12, 93–102 (2001)
Google Scholar
H. Kawaji, Y. Takenaka, H. Matsuda, Graph-based clustering for finding distant relationships in a large set of protein sequences. Bioinformatics 20(2), 243–252 (2004)
Article Google Scholar
B.P. Kelley, R. Sharan, R.M. Karp, Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc. Natl. Acad. Sci. USA 100, 11394–11399 (2003)
Article Google Scholar
M.G. Kendall, Rank Correlation Methods (Griffin, Oxford, 1948)
MATH Google Scholar
J. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM 46, 604–632 (1999)
Article MathSciNet MATH Google Scholar
E.V. Koonin, R.L. Tatusov, K.E. Rudd, Sequence similarity analysis of Escherichia coli proteins: functional and evolutionary implications. Proc. Natl Acad. Sci. USA 92, 11921–11925 (1995)
Article Google Scholar
L. Laloux, P. Cizeau, J.-P. Bouchad, M. Potters, Noise dressing of financial correlation matrices. Phys. Rev. Lett. 83(7), 1467–1470 (1999)
Article Google Scholar
R.D. Luce, Connectivity and generalized cliques in sociometric group structure. Psychometrika 15, 169–190 (1950)
Article MathSciNet Google Scholar
R.N. Mantegna, H.E. Stanley, An Introduction to Econophysics: Correlations and Complexity in Finance (Cambridge University Press, Cambridge/New York, 2000)
Google Scholar
H. Matsuda, T. Ishihara, A. Hashimoto, Classifying molecular sequences using a linkage graph with their pairwise similarities. Theor. Comput. Sci. 210, 305–325 (1999)
Article MathSciNet MATH Google Scholar
N. Memon, J.J. Xu, L.L. Hicks, H. Chen, Data Mining for Social Network Data (Springer, New York/London, 2010)
Book Google Scholar
A. Mendelzon, P. Wood, Finding regular simple paths in graph databases. SIAM J. Comput. 24, 1235–1258 (1995)
Article MathSciNet MATH Google Scholar
A. Mendelzon, G. Mihaila, T. Milo, Querying the World Wide Web. J. Digit. Libr. 1, 68–88 (1997)
Google Scholar
Miniwatts Marketing Group, Internet growth statistics (2008), http://www.internetworldstats.com/emarketing.htm
R.J. Mokken, Cliques, clubs and clans. Qual. Quant. 13, 161–173 (1979)
Article Google Scholar
J.M. Murre, D.P. Sturdy, The connectivity of the brain: multi-level quantitative analysis. Biol. Cybern. 73, 529–545 (1995)
Article MATH Google Scholar
S. Nijssen, J.N. Kok, A quickstart in frequent structure mining can make a difference. LIACS, Leiden University, The Netherlands, Tech. Rep., April 2004
Google Scholar
H. Ogata, W. Fujibuchi, S. Goto, A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters. Nucleic Acids Res. 28, 4021–4028 (2000)
Article Google Scholar
P.M. Pardalos, J. Xue, The maximum clique problem. J. Glob. Optim. 301–328 (1994)
Google Scholar
P.M. Pardalos, T. Mavridou, J. Xue, The graph coloring problem: a bibliographic survey, in Handbook of Combinatorial Optimization, vol. 2, ed. by D.-Z. Du, P.M. Pardalos (Kluwer, Dodrecht, 1998), pp. 331–395
Google Scholar
J. Pattillo, A. Veremyev, S. Butenko, V. Boginski, On the maximum quasi-clique problem. Discrete Appl. Math. 161(1–2), 244–257 (2013) doi: 10.1016/j.dam.2012.07.019, http://www.sciencedirect.com/science/article/pii/S0166218X12002843
Google Scholar
J. Pei, D. Jiang, A. Zhang, Mining cross-graph quasi-cliques in gene expression and protein interaction data, in Proceedings of the 21st International Conference on Data Engineering, Tokyo, 2005, pp. 353–354
Google Scholar
V. Plerou, P. Gopikrishnan, B. Rosenow, L.A.N. Amaral, H.E. Stanley, Universal and nonuniversal properties of cross correlations in financial time series. Phys. Rev. Lett. 83(7), 1471–1474 (1999)
Article Google Scholar
O.A. Prokopyev, V. Boginski, W. Chaovalitwongse, P.M. Pardalos, J.C. Sackellares, P.R. Carney, Network-based techniques in EEG data analysis and epileptic brain modeling, in Data Mining in Biomedicine, ed. by P.M. Pardalos, V. Boginski, A. Vazacopoulos (Springer, New York, 2007), pp. 559–573
Chapter Google Scholar
L. Ralaivolaa, J.S. Swamidassa, H. Saigoa, P. Baldi, Graph kernels for chemical informatics. Neural Netw. 18, 1093–1110 (2005)
Article Google Scholar
J. Scott, T. Ideker, R.M. Karp, R. Sharan, Efficient algorithms for detecting signaling pathways in protein interaction networks. J. Comput. Biol. 13, 133–144 (2006)
Article MathSciNet Google Scholar
S.B. Seidman, B.L. Foster, A graph theoretic generalization of the clique concept. J. Math. Sociol. 6, 139–154 (1978)
Article MathSciNet MATH Google Scholar
C. Spearman, The proof and measurement of association between two things. Am. J. Psychol. 15(1), 72–101 (1904)
Article Google Scholar
M. Steffen, A. Petti, J. Aach, Automated modelling of signal transduction networks. BMC Bioinform. 3, 34 (2002)
Article Google Scholar
P.-N. Tan, M. Steingach, V. Kumar, Introduction to Data Mining (Addison-Wesley, Boston, 2006)
Google Scholar
A. Veremyev, V. Boginski, Identifying large robust network clusters via new compact formulations of maximum k-club problems. Eur. J. Obstet. Gyn. R. B. 218(2), 316–326 (2012)
MathSciNet MATH Google Scholar
N. Wale, X. Ning, G. Karypis, Trends in chemical graph data mining, in Managing and Mining Graph Data (Springer, New York, 2010), pp. 581–606
Chapter Google Scholar
T. Washio, H. Motoda, State of the art of graph-based data mining. SIGKDD Explor. Newsl. 5(1), 59–68 (2003)
Article Google Scholar
W. Willinger, D. Alderson, J.C. Doyle, Mathematics and the internet: a source of enormous confusion and great potential. Not. Am. Math. Soc. 56(5), 286–299 (2009)
MathSciNet Google Scholar
Q. Yang, S.-H. Sze, Path matching and graph matching in biological networks. J. Comput. Biol. 14(1), 56–67 (2007)
Article MathSciNet Google Scholar
S.-H. Yook, Z.N. Oltvai, A.-L. Barabasi, Functional and topological characterization of protein interaction networks. Proteomics 4, 928–942 (2004)
Article Google Scholar
Z. Zeng, J. Wang, L. Zhou, G. Karypis, Coherent closed quasi-clique discovery from large dense graph databases, in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, 2006, pp. 797–802
Google Scholar

Download references

Author information

Authors and Affiliations

Industrial and Systems Engineering Department, University of Florida, 32611, Gainesville, FL, USA
Oleg Shirokikh
Industrial and Systems Engineering Department, University of Florida, 32611, Gainesville, FL, USA
Vladimir Stozhkov
Industrial and Systems Engineering Department, University of Florida, 32611, Shalimar, FL, USA
Vladimir Boginski

Authors

Oleg Shirokikh
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Stozhkov
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Boginski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Oleg Shirokikh .

Editor information

Editors and Affiliations

Department of Industrial and Systems Eng, University of Florida, Gainesville, Florida, USA
Panos M. Pardalos
Department of Computer Science, University of Texas, Dallas, Richardson, Texas, USA
Ding-Zhu Du
Dept. Comp. Sci. & Engineering, University of California, San Diego, La Jolla, California, USA
Ronald L. Graham

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Shirokikh, O., Stozhkov, V., Boginski, V. (2013). Combinatorial Optimization Techniques for Network-Based Data Mining. In: Pardalos, P., Du, DZ., Graham, R. (eds) Handbook of Combinatorial Optimization. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7997-1_6

Download citation

DOI: https://doi.org/10.1007/978-1-4419-7997-1_6
Published: 26 July 2013
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-7996-4
Online ISBN: 978-1-4419-7997-1
eBook Packages: Mathematics and StatisticsReference Module Computer Science and Engineering

Publish with us

Policies and ethics