Abstract
Mixed membership factorization is a popular approach for analyzing data sets that have within-sample heterogeneity. In recent years, several algorithms have been developed for mixed membership matrix factorization, but they only guarantee estimates from a local optimum. Here, we derive a global optimization algorithm that provides a guaranteed 𝜖-global optimum for a sparse mixed membership matrix factorization problem. We test the algorithm on simulated data and a small real gene expression dataset and find the algorithm always bounds the global optimum across random initializations and explores multiple modes efficiently.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Airoldi, E.M., Blei, D.M., Fienberg, S.E., Xing, E.P.: Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9, 1981–2014 (2008)
Benders, J.F.: Partitioning procedures for solving mixed-variables programming problems. Numer. Math. 4(1), 238–252 (1962)
Blei, D.M., Lafferty, J.D.: Correlated topic models. In: Proceedings of the International Conference on Machine Learning, pp 113–120 (2006)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112(518), 859–877 (2017)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Dheeru, D., Karra T.E.: UCI machine learning repository. URL UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Floudas, C.A.: Deterministic Global Optimization, Nonconvex Optimization and Its Applications, vol 37. Springer, Boston (2000)
Floudas, C.A.: Deterministic Global Optimization: Theory, Methods and Applications, vol. 37. Springer, Berlin (2013)
Floudas, C.A., Gounaris, C.E.: A review of recent advances in global optimization. J. Glob. Optim. 45, 3–38 (2008)
Floudas, C.A., Visweswaran, V.: A global optimization algorithm (GOP) for certain classes of nonconvex NLPS. Comput. Chem. Eng. 14(12), 1–34 (1990)
Geoffrion, A.M.: Generalized benders decomposition. J. Optim. Theory Appl. 10, 237–260 (1972)
Gorski, J., Pfeuffer, F., Klamroth, K.: Biconvex sets and optimization with biconvex functions: a survey and extensions. Math. Methods Oper. Res. 66, 373–407 (2007)
Gurobi Optimization, Inc (2018) Gurobi optimizer version 8.0
Horst, R., Tuy, H.: Global Optimization: Deterministic Approaches. Springer, Berlin (2013)
Kabán, A.: On Bayesian classification with laplace priors. Pattern Recognit. Lett. 28(10), 1271–1282 (2007)
Lancaster, P., Tismenetsky, M., et al.: The theory of matrices: with applications. Elsevier, San Diego (1985)
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)
MacKay, D.J.C.: Bayesian interpolation. Neural Comput. 4(3), 415–447 (1992)
Mackey, L., Weiss, D., Jordan, M.I.: Mixed membership matrix factorization. In: International Conference on Machine Learning, pp 1–8 (2010)
Pritchard, J.K., Stephens, M., Donnelly, P.: Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000)
Saddiki, H., McAuliffe, J., Flaherty, P.: GLAD: a mixed-membership model for heterogeneous tumor subtype classification. Bioinformatics 31(2), 225–232 (2015)
Singh, A.P., Gordon, G.J.: A unified view of matrix factorization models. In: Lecture Notes in Computer Science, vol. 5212, pp. 358–373, Springer, Berlin (2008)
Taddy, M.: Multinomial inverse regression for text analysis. J. Am. Stat. Assoc. 108(503), 755–770, (2013). https://doi.org/10.1080/01621459.2012.734168
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Sharing clusters among related groups: hierarchical Dirichlet processes. In: Advances in Neural Information Processing Systems, vol. 1, MIT Press, Cambridge (2005)
Weinstein, J.N., Collisson, E.A., Mills, G.B., Shaw, K.R.M., Ozenberger, B.A., Ellrott, K., Shmulevich, I., Sander, C., Stuart, J.M., Network CGAR, et al.: The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45(10), 1113 (2013)
Xiao, H., Stibor, T.: Efficient collapsed Gibbs sampling for latent Dirichlet allocation. In: Sugiyama, M., Yang, Q. (eds.) Proceedings of 2nd Asian Conference on Machine Learning, vol. 13, pp. 63–78 (2010)
Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval–SIGIR ’03, p. 267 (2003)
Zaslavsky, T.: Facing Up to Arrangements: Face-Count Formulas for Partitions of Space by Hyperplanes: Face-Count Formulas for Partitions of Space by Hyperplanes, vol. 154. American Mathematical Society (1975)
Acknowledgements
We acknowledge Hachem Saddiki for valuable discussions and comments on the manuscript.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix 1: Derivation of Relaxed Dual Problem Constraints
The Lagrange function is the sum of the Lagrange functions for each sample,
and the Lagrange function for a single sample is
We see that the Lagrange function is biconvex in x and θ i. We develop the constraints for a single sample for the remainder.
1.1 Linearized Lagrange Function with Respect to x
Casting x as a vector and rewriting the Lagrange function gives
where \(\bar {x}\) is formed by stacking the columns of x in order. The coefficients are formed such that
The linear coefficient matrix is the KM × 1 vector,
The quadratic coefficient is the KM × KM and block matrix
The Taylor series approximation about x 0 is
The gradient with respect to x is
Plugging the gradient into the Taylor series approximation gives
Simplifying the linearized Lagrange function gives
Finally, we write the linearized Lagrangian using the matrix form of x 0,
While the original Lagrange function is convex in θ i for a fixed x, the linearized Lagrange function is not necessarily convex in θ i. This can be seen by collecting the quadratic, linear and constant terms with respect to θ i,
Now, if and only if \(2x_0^Tx - x_0^Tx_0 \succeq 0\) is positive semidefinite, then \(L(y_i, \theta _i, x, \lambda _i, \mu _i) \bigg |{ }^{\text{lin}}_{x_0}\) is convex. The condition is satisfied at x = x 0 but may be violated at some other value of x.
1.2 Linearized Lagrange Function with Respect to θ i
Now, we linearize (7.18) with respect to θ i. Using the Taylor series approximation with respect to θ 0i gives
The gradient for this Taylor series approximation is

where g i(x) is the vector of K qualifying constraints associated with the Lagrange function. The qualifying constraint is linear in x. Plugging the gradient into the approximation gives
The linearized Lagrange function is bi-linear in x and θ i. Finally, simplifying the linearized Lagrange function gives
Appendix 2: Proof of Biconvexity
To prove the optimization problem is biconvex, first we show the feasible region over which we are optimizing is biconvex. Then, we show the objective function is biconvex by fixing θ and showing convexity with respect to x, and then vice versa.
1.1 The Constraints Form a Biconvex Feasible Region
Our constraints can be written as
The inequality constraint (7.25) is convex if either x or θ is fixed, because any norm is convex. The equality constraints (7.26) is an affine combination that is still affine if either x or θ is fixed. Every affine set is convex. The inequality constraint (7.27) is convex if either x or θ is fixed, because θ is a linear function.
1.2 The Objective Is Convex with Respect to θ
We prove the objective is a biconvex function using the following two theorems.
Theorem 1
Let \(A \subseteq {\mathbb {R}^n}\) be a convex open set and let \(f: A \rightarrow \mathbb {R}\) be twice differentiable. Write H(x) for the Hessian matrix of f at x ∈ A. If H(x) is positive semidefinite for all x ∈ A, then f is convex (Boyd and Vandenberghe 2004).
Theorem 2
A symmetric matrix A is positive semidefinite (PSD) if and only if there exists B such that A = B TB (Lancaster et al. 1985).
The objective of our problem is,
The objective function is the sum of the objective functions for each sample.
The gradient with respect to θ i,
Take second derivative with respect to θ i to get Hessian matrix,


The Hessian matrix \(\nabla _{\theta _i}^2 f(y_i,x,\theta _i)\) is positive semidefinite based on Theorem 2. Then, we have f(y i, x, θ i) is convex in θ i based on Theorem 1. The objective f(y, x, θ) is convex with respect to θ, because the sum of convex functions, \(\sum _{i=1}^{N}f(y_i,x,\theta _i)\), is still a convex function.
1.3 The Objective Is Convex with Respect to x
The objective function for sample i is
We cast x as a vector \(\bar {x}\), which is formed by stacking the columns of x in order. We rewrite the objective function as
The coefficients are formed such that
The linear coefficient matrix is the KM × 1 vector
The quadratic coefficient is the KM × KM and block matrix

The gradient with respect to \(\bar {x}\)
Take second derivative to get Hessian matrix,
The Hessian matrix \(\nabla _{\bar {x}}^2 f(y_i,\bar {x},\theta _i)\) is positive semidefinite based on Theorem 2. Then, we have \(f(y_i,\bar {x},\theta _i)\) is convex in \(\bar {x}\) based on Theorem 1. The objective f(y, x, θ) is convex with respect to x, because the sum of convex functions, \(\sum _{i=1}^{N}f(y_i,x,\theta _i)\), is still a convex function.
The objective is biconvex with respect to both x and θ. Thus, we have a biconvex optimization problem based on the proof of biconvexity of the constraints and the objective.
Appendix 3: A-Star Search Algorithm
In this procedure, first we remove all the duplicate and all-zero coefficients hyperplanes to get unique hyperplanes. Then we start from a specific region r and put it into a open set. Open set is used to maintain a region list which need to be explored. Each time we pick one region from the open set to find adjacent regions. Once finishing the step of finding adjacent regions, region r will be moved into a closed set. Closed set is used to maintain a region list which already be explored. Also, if the adjacent region is a newly found one, it also need to be put into the open set for exploring. Finally, once the open set is empty, regions in the closed set are all the unique regions, and the number of the unique regions is the length of the closed set. This procedure begins from one region and expands to all the neighbors until no new neighbor is existed.
The overview of the A-star search algorithm to identify unique regions is shown in Algorithm 1.
Algorithm 1 A-star Search Algorithm

1.1 Hyperplane Filtering
Assuming there are two different hyperplanes H i and H j represented by \(A_i=\left \{a_{i,0},\ldots ,a_{i,MK}\right \}\) and \(A_j=\left \{a_{j,0},\ldots ,a_{j,MK}\right \}\). We take these two hyperplanes duplicated when
This can be converted to
where threshold τ is a very small positive value.
We eliminate a hyperplane H i represented by \(A_i=\left \{a_{i,0},\ldots ,a_{i,MK}\right \}\) from hyperplane arrangement \({\mathcal {A}}\) if the coefficients of A i are all zero,
The arrangement \({\mathcal {A}}^\prime \) is the reduced arrangement and A ′x = b are the equations of unique hyperplanes.
1.2 Interior Point Method
An interior point is found by solving the following optimization problem:

Algorithm 2 Interior Point Method (Component 1)

Algorithm 3 Get Adjacent Regions (Component 2)

Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Zhang, F., Wang, C., Trapp, A.C., Flaherty, P. (2019). A Global Optimization Algorithm for Sparse Mixed Membership Matrix Factorization. In: Zhang, L., Chen, DG., Jiang, H., Li, G., Quan, H. (eds) Contemporary Biostatistics with Biopharmaceutical Applications. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-030-15310-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-15310-6_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15309-0
Online ISBN: 978-3-030-15310-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)