Sparse Inverse Covariance Estimation for Graph Representation of Feature Structure

Lee, Sangkyun

doi:10.1007/978-3-662-43968-5_13

Sparse Inverse Covariance Estimation for Graph Representation of Feature Structure

Sangkyun Lee¹⁷

Chapter

3585 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8401))

Abstract

The access to more information provided by modern high-throughput measurement systems has made it possible to investigate finer details of complex systems. However, it also has increased the number of features, and thereby the dimensionality in data, to be processed in data analysis. Higher dimensionality makes it particularly challenging to understand complex systems, by blowing up the number of possible configurations of features we need to consider. Structure learning with the Gaussian Markov random field can provide a remedy, by identifying conditional independence structure of features in a form that is easy to visualize and understand. The learning is based on a convex optimization problem, called the sparse inverse covariance estimation, for which many efficient algorithms have been developed in the past few years. When dimensions are much larger than sample sizes, structure learning requires to consider statistical stability, in which connections to data mining arise in terms of discovering common or rare subgraphs as patterns. The outcome of structure learning can be visualized as graphs, represented accordingly to additional information if required, providing a perceivable way to investigate complex feature spaces.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Grunenwald, H., Baas, B., Caruccio, N., Syed, F.: Rapid, high-throughput library preparation for next-generation sequencing. Nature Methods 7(8) (2010)
Google Scholar
Soon, W.W., Hariharan, M., Snyder, M.P.: High-throughput sequencing for biology and medicine. Molecular Systems Biology 9, 640 (2013)
Article Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
MATH Google Scholar
Khatri, P., Draghici, S.: Ontological analysis of gene expression data: Current tools, limitations, and open problems. Bioinformatics 21(18), 3587–3595 (2005)
Article Google Scholar
Altman, T., Travers, M., Kothari, A., Caspi, R., Karp, P.D.: A systematic comparison of the MetaCyc and KEGG pathway databases. BMC Bioinformatics 14(1), 112 (2013)
Article Google Scholar
Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT Press (2009)
Google Scholar
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc. (1988)
Google Scholar
Friedman, N., Linial, M., Nachman, I., Pe’er, D.: Using Bayesian networks to analyze expression data. Journal of Computational Biology: A Journal of Computational Molecular Cell Biology 7(3-4), 601–620 (2000)
Article Google Scholar
Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D.A., Nolan, G.P.: Causal protein-signaling networks derived from multiparameter single-cell data. Science 308(5721), 523–529 (2005)
Article Google Scholar
Jiang, X., Cooper, G.F.: A Bayesian spatio-temporal method for disease outbreak detection. Journal of the American Medical Informatics Association 17(4), 462–471 (2010)
Article Google Scholar
Chickering, D.: Learning equivalence classes of Bayesian-network structures. Journal of Machine Learning Research, 445–498 (2002)
Google Scholar
Chickering, D., Heckerman, D., Meek, C.: Large- sample learning of Bayesian networks is NP-hard. Journal of Machine Learning Research 5, 1287–1330 (2004)
MathSciNet MATH Google Scholar
Meinshausen, N., Bühlmann, P.: High-dimensional graphs and variable selection with the lasso. Annals of Statistics 34, 1436–1462 (2006)
Article MathSciNet MATH Google Scholar
Yuan, M., Lin, Y.: Model selection and estimation in the gaussian graphical model. Biometrika 94(1), 19–35 (2007)
Article MathSciNet MATH Google Scholar
Banerjee, O., Ghaoui, L.E., d’Aspremont, A.: Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. Journal of Machine Learning Research 9, 485–516 (2008)
MathSciNet MATH Google Scholar
Duchi, J., Gould, S., Koller, D.: Projected subgradient methods for learning sparse gaussians. In: Conference on Uncertainty in Artificial Intelligence (2008)
Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432–441 (2008)
Article MATH Google Scholar
Meinshausen, N., Bühlmann, P.: Stability selection. Journal of the Royal Statistical Society (Series B) 72(4), 417–473 (2010)
Article MathSciNet Google Scholar
Scheinberg, K., Ma, S., Goldfarb, D.: Sparse inverse covariance selection via alternating linearization methods. In: Advances in Neural Information Processing Systems 23, pp. 2101–2109. MIT Press (2010)
Google Scholar
Johnson, C., Jalali, A., Ravikumar, P.: High-dimensional sparse inverse covariance estimation using greedy methods. In: Proceedings of the 15th International Conference on Artificial Intelligence and Statistics (2012)
Google Scholar
Dinh, Q.T., Kyrillidis, A., Cevher, V.: A proximal newton framework for composite minimization: Graph learning without cholesky decompositions and matrix inversions. In: International Conference on Machine Learning (2013)
Google Scholar
Dempster, A.P.: Covariance selection. Biometrika 32, 95–108 (1972)
Google Scholar
Whittaker, J.: Graphical Models in Applied Multivariate Statistics. Wiley (1990)
Google Scholar
Giudici, P., Green, P.J.: Decomposable graphical Gaussian model determination. Biometrika 86(4), 785–801 (1999)
Article MathSciNet MATH Google Scholar
Dobra, A., Hans, C., Jones, B., Nevins, J.R., Yao, G., West, M.: Sparse graphical models for exploring gene expression data. Journal of Multivariate Analysis 90(1), 196–212 (2004)
Article MathSciNet MATH Google Scholar
Verzelen, N., Villers, F.: Tests for gaussian graphical models. Computational Statistics and Data Analysis 53(5), 1894–1905 (2009)
Article MathSciNet MATH Google Scholar
Hunt, B.R.: The application of constrained least squares estimation to image restoration by digital computer. IEEE Transactions on Computers C-22(9), 805–812 (1973)
Article Google Scholar
Chellappa, R., Chatterjee, S.: Classification of textures using Gaussian Markov random fields. IEEE Transactions on Acoustics, Speech and Signal Processing 33(4), 959–963 (1985)
Article MathSciNet Google Scholar
Cross, G.R., Jain, A.K.: Markov random field texture models. IEEE Transactions on Pattern Analysis and Machine Intelligence 5(1), 25–39 (1983)
Article Google Scholar
Manjunath, B.S., Chellappa, R.: Unsupervised texture segmentation using Markov random field models. IEEE Transactions on Pattern Analysis and Machine Intelligence 13(5), 478–482 (1991)
Article Google Scholar
Dryden, I., Ippoliti, L., Romagnoli, L.: Adjusted maximum likelihood and pseudo-likelihood estimation for noisy Gaussian Markov random fields. Journal of Computational and Graphical Statistics 11(2), 370–388 (2002)
Article MathSciNet Google Scholar
Cox, D.R., Wermuth, N.: Multivariate Dependencies: Models, Analysis and Interpretation. Chapman and Hall (1996)
Google Scholar
Edwards, D.M.: Introduction to Graphical Modelling. Springer (2000)
Google Scholar
Rue, H., Held, L.: Gaussian Markov Random Fields: Theory and Applications. Monographs on Statistics and Applied Probability, vol. 104. Chapman & Hall (2005)
Google Scholar
Wasserman, L.: All of statistics: A concise course in statistical inference. Springer (2010)
Google Scholar
Lauritzen, S.L.: Graphical Models. Oxford University Press (1996)
Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization. 2nd edn. Springer (2006)
Google Scholar
Aldrich, J.: R.A. Fisher and the making of maximum likelihood 1912–1922. Statistical Science 12(3), 162–176 (1997)
Article MathSciNet MATH Google Scholar
Tikhonov, A.N.: On the stability of inverse problems. Doklady Akademii Nauk SSSR 5, 195–198 (1943)
Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society (Series B) 58, 267–288 (1996)
MathSciNet MATH Google Scholar
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society (Series B) 67, 301–320 (2005)
Article MathSciNet MATH Google Scholar
Lee, S., Wright, S.J.: Manifold identification in dual averaging methods for regularized stochastic online learning. Journal of Machine Learning Research 13, 1705–1744 (2012)
MathSciNet MATH Google Scholar
Piatkowski, N., Lee, S., Morik, K.: Spatio-temporal random fields: compressible representation and distributed estimation. Machine Learning 93(1), 115–139 (2013)
Article MathSciNet MATH Google Scholar
Candés, E.J., Romberg, J., Tao, T.: Stable signal recovery from incomplete and inaccurate measurements. Comm. Pure Appl. Math. 59, 1207–1223 (2005)
Article MathSciNet MATH Google Scholar
Lee, S., Wright, S.J.: Implementing algorithms for signal and image reconstruction on graphical processing units. Technical report, University of Wisconsin-Madison (2008)
Google Scholar
Okayama, H., Kohno, T., Ishii, Y., Shimada, Y., Shiraishi, K., Iwakawa, R., Furuta, K., Tsuta, K., Shibata, T., Yamamoto, S., Watanabe, S.I., Sakamoto, H., Kumamoto, K., Takenoshita, S., Gotoh, N., Mizuno, H., Sarai, A., Kawano, S., Yamaguchi, R., Miyano, S., Yokota, J.: Identification of genes upregulated in ALK-positive and EGFR/KRAS/ALK-negative lung adenocarcinomas. Cancer Res. 72(1), 100–111 (2012)
Article Google Scholar
Yamauchi, M., Yamaguchi, R., Nakata, A., Kohno, T., Nagasaki, M., Shimamura, T., Imoto, S., Saito, A., Ueno, K., Hatanaka, Y., Yoshida, R., Higuchi, T., Nomura, M., Beer, D.G., Yokota, J., Miyano, S., Gotoh, N.: Epidermal growth factor receptor tyrosine kinase defines critical prognostic genes of stage I lung adenocarcinoma. PLoS ONE 7(9), e43923 (2012)
Google Scholar
McCall, M.N., Bolstad, B.M., Irizarry, R.A.: Frozen robust multiarray analysis (fRMA). Biostatistics 11(2), 242–253 (2010)
Article Google Scholar
McCall, M., Murakami, P., Lukk, M., Huber, W., Irizarry, R.: Assessing affymetrix genechip microarray quality. BMC Bioinformatics 12(1), 137 (2011)
Article Google Scholar
Vandenberghe, L., Boyd, S., Wu, S.P.: Determinant maximization with linear matrix inequality constraints. SIAM Journal on Matrix Analysis and Applications 19(2), 499–533 (1998)
Article MathSciNet MATH Google Scholar
Levitin, E., Polyak, B.: Constrained minimization methods. USSR Computational Mathematics and Mathematical Physics 6(5), 1–50 (1966)
Article Google Scholar
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers (2004)
Google Scholar
Nesterov, Y.: Smooth minimization of non-smooth functions. Mathematical Programming 103, 127–152 (2005)
Article MathSciNet MATH Google Scholar
d’Aspremont, A., Banerjee, O., El Ghaoui, L.: First-order methods for sparse covariance selection. SIAM Journal on Matrix Analysis and Applications 30(1), 56–66 (2008)
Article MathSciNet MATH Google Scholar
Lu, Z.: Smooth optimization approach for sparse covariance selection. SIAM Journal on Optimization 19(4), 1807–1827 (2009)
Article MathSciNet MATH Google Scholar
Yuan, X.: Alternating direction method for covariance selection models. Journal of Scientific Computing 51(2), 261–273 (2012)
Article MathSciNet MATH Google Scholar
Hsieh, C.J., Dhillon, I.S., Ravikumar, P.K., Sustik, M.A.: Sparse inverse covariance matrix estimation using quadratic approximation. In: Advances in Neural Information Processing Systems 24, pp. 2330–2338. MIT Press (2011)
Google Scholar
Oztoprak, F., Nocedal, J., Rennie, S., Olsen, P.A.: Newton-like methods for sparse inverse covariance estimation. In: Advances in Neural Information Processing Systems 25, pp. 764–772. MIT Press (2012)
Google Scholar
Hsieh, C.J., Sustik, M.A., Dhillon, I., Ravikumar, P., Poldrack, R.: BIG & QUIC: Sparse inverse covariance estimation for a million variables. In: Advances in Neural Information Processing Systems 26, pp. 3165–3173. MIT Press (2013)
Google Scholar
Zhao, P., Yu, B.: On model selection consistency of lasso. Journal of Machine Learning Research 7, 2541–2563 (2006)
MathSciNet MATH Google Scholar
Efron, B.: Bootstrap methods: Another look at the jackknife. Annals of Statistics 7(1), 1–26 (1979)
Article MathSciNet MATH Google Scholar
Efron, B., Tibshirani, R.: Cross-validation and the bootstrap: Estimating the error rate of a prediction rule. Technical report. Department of Statistics, Stanford University (May 1995)
Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)
MATH Google Scholar
Emmert-Streib, F., Simoes, R.D.M., Glazko, G., Mcdade, S., Holzinger, A., Dehmer, M., Campbell, F.C.: Functional and genetic analysis of the colon cancer network. BMC Bioinformatics, 1–24 (to appear 2014)
Google Scholar
Whitney, H.: Congruent graphs and the connectivity of graphs. American Journal of Mathematics 54(1), 150–168 (1932)
Article MathSciNet MATH Google Scholar
Ullmann, J.R.: An algorithm for subgraph isomorphism. Journal of the ACM 23(1), 31–42 (1976)
Article MathSciNet Google Scholar
Spielman, D.A.: Faster isomorphism testing of strongly regular graphs. In: Proceedings of the Twenty-eighth Annual ACM Symposium on Theory of Computing, pp. 576–584 (1996)
Google Scholar
Arvind, V., Kurur, P.P.: Graph isomorphism is in SPP. Information and Computation 204(5), 835–852 (2006)
Article MathSciNet MATH Google Scholar
Datta, S., Limaye, N., Nimbhorkar, P., Thierauf, T., Wagner, F.: Planar graph isomorphism is in log-space. In: 24th Annual IEEE Conference on Computational Complexity, pp. 203–214 (2009)
Google Scholar
Narayanamurthy, S.M., Ravindran, B.: On the hardness of finding symmetries in Markov decision processes. In: Proceedings of the 25th International Conference on Machine Learning, pp. 688–696 (2008)
Google Scholar
Cook, D.J., Holder, L.B.: Mining Graph Data. John Wiley & Sons (2006)
Google Scholar
Holzinger, A.: Human–computer interaction & knowledge discovery (HCI-KDD): What is the benefit of bringing those two fields to work together? In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES 2013. LNCS, vol. 8127, pp. 319–328. Springer, Heidelberg (2013)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Fakultät für Informatik, LS VIII Technische Universität Dortmund, 44221, Dortmund, Germany
Sangkyun Lee

Authors

Sangkyun Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Research Unit Human-Computer Interaction, Austrian IBM Watson Think Gruop, Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Auenbruggerplatz 2/V, 8036, Graz, Austria
Andreas Holzinger
IBM Life Sciences Discovery Centre, TECHNA for the Advancement of Technology for Health, Princess Margaret Cancer Centre, University Health Network, TMDT Room 11-314, 101 College Street, M5G 1L7, Toronto, ON, Canada
Igor Jurisica

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lee, S. (2014). Sparse Inverse Covariance Estimation for Graph Representation of Feature Structure. In: Holzinger, A., Jurisica, I. (eds) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. Lecture Notes in Computer Science, vol 8401. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43968-5_13

Download citation

DOI: https://doi.org/10.1007/978-3-662-43968-5_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-43967-8
Online ISBN: 978-3-662-43968-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics