# Algorithms for approximate subtropical matrix factorization

- 126 Downloads

## Abstract

Matrix factorization methods are important tools in data mining and analysis. They can be used for many tasks, ranging from dimensionality reduction to visualization. In this paper we concentrate on the use of matrix factorizations for finding patterns from the data. Rather than using the standard algebra—and the summation of the rank-1 components to build the approximation of the original matrix—we use the subtropical algebra, which is an algebra over the nonnegative real values with the summation replaced by the maximum operator. Subtropical matrix factorizations allow “winner-takes-it-all” interpretations of the rank-1 components, revealing different structure than the normal (nonnegative) factorizations. We study the complexity and sparsity of the factorizations, and present a framework for finding low-rank subtropical factorizations. We present two specific algorithms, called Capricorn and Cancer, that are part of our framework. They can be used with data that has been corrupted with different types of noise, and with different error metrics, including the sum-of-absolute differences, Frobenius norm, and Jensen–Shannon divergence. Our experiments show that the algorithms perform well on data that has subtropical structure, and that they can find factorizations that are both sparse and easy to interpret.

## Keywords

Tropical algebra Max-times algebra Matrix factorizations Data mining## 1 Introduction

Finding simple patterns that can be used to describe the data is one of the main problems in data mining. The data mining literature knows many different techniques for this general task, but one of the most common pattern finding techniques rarely gets classified as such. Matrix factorizations (or decompositions, these two terms are used interchangeably in this paper) represent the given input matrix Open image in new window as a product of two (or more) factor matrices, Open image in new window . This standard formulation of matrix factorizations makes their pattern mining nature less obvious, but let us write the matrix product Open image in new window as a sum of rank-1 matrices, Open image in new window , where Open image in new window is the outer product of the \(i\hbox {th}\) column of Open image in new window and the \(i\hbox {th}\) row of Open image in new window . Now it becomes clear that the rank-1 matrices Open image in new window are the “simple patterns”, and the matrix factorization is finding *k* such patterns whose sum is a good approximation of the original data matrix.

This so-called “component interpretation” (Skillicorn 2007) is more appealing with some factorizations than with others. For example, the classical singular value decomposition (SVD) does not easily admit such an interpretation, as the components are not easy to interpret without knowing the earlier components. On the other hand, the motivation for the nonnegative matrix factorization (NMF) often comes from the component interpretation, as can be seen, for example, in the famous “parts of faces” figures of Lee and Seung (1999). The “parts-of-whole” interpretation is in the hearth of NMF: every rank-1 component adds something to the overall decomposition, and never removes anything. This aids with the interpretation of the components, and is also often claimed to yield sparse factors, although this latter point is more contentious (see e.g. Hoyer 2004).

Perhaps the reason why matrix factorization methods are not often considered as pattern mining methods is that the rank-1 matrices are summed together to build the full data. Hence, it is rare for any rank-1 component to explain any part of the input matrix alone. But the use of summation as a way to aggregate the rank-1 components can be considered to be “merely” a consequence of the fact that we are using the standard algebra. If we change the algebra—in particular, if we change how we define the summation—we change the operator used for the aggregation. In this work, we propose to use the *maximum* operator to define the summation over the nonnegative matrices, giving us what is known as the *subtropical algebra*. As the aggregation of the rank-1 factors is now the element-wise maximum, we obtain what we call the “winner-takes-it-all” interpretation: the final value of each element in the approximation is defined only by the largest value in the corresponding element in the rank-1 matrices. This can be considered a staple of the subtropical structure—for each element in the data we can find a single rank-1 pattern, the “winner”, that determines its value exactly. This is in contrast to the NMF structure, where each pattern would only make a “contribution” to the final value.

*different*: the factorizations can be better or worse in terms of the reconstruction error but the patterns they find are usually different to those found by NMF. It is also worth mentioning that the same dataset often has both kinds of structures in it, in which case subtropical and NMF patterns are complementary to each other, and depending on an application, one or the other can be more useful. One practical advantage of the subtropical methods though is that they tend to find more concise representation of patterns in the data, while NMF often splits them into several smaller components, making it harder to see the big picture.

To illustrate this, and in general the kind of structure subtropical matrix factorization can reveal and how it is different from that of NMF, we show example results on the European climate data (Fig. 1).

The data contains weather records for Europe between 1960 and 1990, and it was obtained from the global climate data repository.^{1} The data has \(2\,575\) rows that correspond to 50-by-50 kilometer squares of land where measurements were made and 48 columns corresponding to observations. More precisely, the first 12 columns represent the average low temperature for each month, the next 12 columns the average high temperature, and the next 12 columns the daily mean. The last 12 columns represent the mean monthly precipitation for each month. We preprocessed every column of the data by first subtracting its mean, dividing by the standard deviation, and then subtracting its minimum value, so that the smallest value becomes 0. We compare the results of our subtropical matrix factorization algorithm, called
Open image in new window
, to those of an NMF algorithm, called
Open image in new window
, that obtained the best reconstruction error on this data (see Table 2 in Sect. 5). For both methods, we chose two factors: one that best identifies the areas of high precipitation and another that reflects summer (i.e. June, July, and August) daily maximum temperatures. To be able to validate the results of the algorithms, we also include the average annual precipitation and average summer maximum temperature in Fig. 2a, b, respectively.

In order to make the above argument more concrete, let us see what happens when we try to combine
Open image in new window
’s factors using the standard algebra instead of the subtropical one. Recall that if Open image in new window is a rank-*k* matrix decomposition of Open image in new window , then we have Open image in new window , where each pattern Open image in new window is an outer product of the \(s\hbox {th}\) column of Open image in new window and the \(s\hbox {th}\) row of Open image in new window . If for some *l* and *t* we have Open image in new window , then also Open image in new window since all values are nonnegative. It is therefore generally undesirable for any subset of the patterns to overcover values in the original data, as there would be no way of decreasing these values by adding more patterns. As an example we will combine the patterns corresponding to
Open image in new window
’s factors from Fig. 1a, b. To obtain the actual rank-1 patterns we first need to compute the outer products of these factors with the corresponding rows of the right-hand side matrix. Now if we denote the obtained patterns by Open image in new window and Open image in new window , then the elements of the matrix Open image in new window show by how much the combination of Open image in new window and Open image in new window overcovers the original data Open image in new window . We now plot the average value of every row of the overcover matrix scaled by the average value in the original data (Fig. 3a). Since each row corresponds to a location on the map, it shows the average amount by which we would overcover the data, were we to use the standard algebra for combining the
Open image in new window
’s factors. It is evident that this method produces many values that are too high (mostly around Alps and other high precipitation areas). On the other hand, when we perform the same procedure using the subtropical algebra (Fig. 3b), there is almost no overcovering.

It is worth mentioning that, although UK and the coastal regions of Norway are not prominent in the Open image in new window ’s factor shown above, they actually belong to some of its other factors (see Fig. 15b). In other words, the high precipitation pattern is split into several parts and partially merged with other factors. This is likely a consequence of the pattern splitting nature of NMF mentioned earlier. On the other hand, using the subtropical structure, we were able to isolate the high precipitation pattern and present it in a single factor.

While the above discussion shows that the subtropical model can be a useful complement to NMF, it is generally difficult to claim that either of them is superior. For example Open image in new window generally provided a more concise representation of patterns in the climate data, outlining its most prominent properties, while Open image in new window ’s strength was recovering the smooth transition between values.

*Contributions and a roadmap* In this paper, we study the use of subtropical decompositions for data analysis.^{2} We start by studying the theoretical aspects of the problem (Sect. 3), showing that the problem is
Open image in new window
-hard to even approximate, but also that sparse matrices have sparse dominated subtropical decompositions.

In Sect. 4, we develop a general framework, called Open image in new window , for finding approximate, low-rank subtropical decompositions, and we will present two instances of this framework, tailored towards different types of data and noise, called Open image in new window and Open image in new window . Open image in new window assumes discrete data with noise that randomly flips the value to a random number, whereas Open image in new window assumes continuous-valued data with standard Gaussian noise.

Our experiments (Sect. 5) show that both Open image in new window and Open image in new window work well on datasets that have the kind of noise they are designed for, and they outperform SVD and different NMF methods when data has subtropical structure. On real-world data, Open image in new window is usually the better of the two, although in terms of reconstruction error, neither of the methods can challenge SVD. On the other hand, we show that both Open image in new window and Open image in new window return interpretable results that show different aspects of the data compared to factorizations made under the standard algebra.

## 2 Notation and basic definitions

*Basic notation* Throughout this paper, we will denote a matrix by upper-case boldface letters ( Open image in new window ), and vectors by lower-case boldface letters (\(\varvec{a}\)). The \(i\hbox {th}\) row of matrix Open image in new window is denoted by Open image in new window and the \(j\hbox {th}\) column by Open image in new window . The matrix Open image in new window with the \(i\hbox {th}\) column removed is denoted by Open image in new window , and Open image in new window is the respective notation for Open image in new window with a removed row. Most matrices and vectors in this paper are restricted to the nonnegative real numbers Open image in new window .

We use the shorthand [*n*] to denote the set \(\{1, 2, \ldots , n\}\).

*Algebras* In this paper we consider matrix factorization over so called *max-times* (or *subtropical*) *algebra*. It differs from the standard algebra of real numbers in that addition is replaced with the operation of taking the maximum. Also the domain is restricted to the set of nonnegative real numbers.

### Definition 1

The *max-times* (or *subtropical*) algebra is a set Open image in new window of nonnegative real numbers together with operations Open image in new window (addition) and Open image in new window (multiplication) defined for any Open image in new window . The identity element for addition is 0 and for multiplication it is 1.

In the future we will use the notation Open image in new window and \(\max \lbrace a, b\rbrace \) and the names *max-times* and *subtropical* interchangeably. It is straightforward to see that the max-times algebra is a *dioid*, that is, a semiring with idempotent addition ( Open image in new window ). It is important to note that subtropical algebra is anti-negative, that is, there is no subtraction operation.

A very closely related algebraic structure is the *max-plus* (*tropical*) algebra (see e.g. Akian et al. 2007).

### Definition 2

The *max-plus* (or *tropical*) algebra is defined over the set of extended real numbers Open image in new window with operations Open image in new window (addition) and Open image in new window (multiplication). The identity elements for addition and multiplication are \(-\infty \) and 0, respectively.

The tropical and subtropical algebras are isomorphic (Blondel et al. 2000), which can be seen by taking the logarithm of the subtropical algebra or the exponent of the tropical algebra (with the conventions that \(\log 0 = -\infty \) and \(\exp (-\infty ) = 0\)). Thus, most of the results we prove for subtropical algebra can be extended to their tropical analogues, although caution should be used when dealing with approximate matrix factorizations. The latter is because, as we will see in Theorem 4, the *reconstruction error* of an approximate matrix factorization under the two different algebras does not transfer directly.

*Matrix products and ranks* The matrix product over the subtropical algebra is defined in the natural way:

### Definition 3

*max-times matrix product*of two matrices Open image in new window and Open image in new window is defined as

We will also need the matrix product over the *tropical* algebra.

### Definition 4

*tropical matrix product*is defined as

The *matrix rank* over the subtropical algebra can be defined in many ways, depending on which definition of the normal matrix rank is taken as the starting point. We will discuss different subtropical ranks in detail in Sect. 3.4. Here we give the main definition of the rank we are using throughout this paper, the so-called *Schein* (or *Barvinok*) *rank* of a matrix.

### Definition 5

The *max-times (Schein or Barvinok) rank* of a matrix Open image in new window is the least integer *k* such that Open image in new window can be expressed as an element-wise maximum of *k* rank-1 matrices, Open image in new window . Matrix Open image in new window has subtropical (Schein/Barvinok) rank of 1 if there exist column vectors Open image in new window and Open image in new window such that Open image in new window . Matrices with subtropical Schein (or Barvinok) rank of 1 are called *blocks*.

When it is clear from the context, we will use the term *rank* (or *subtropical rank*) without other qualifiers to denote the subtropical Schein/Barvinok rank.

*Special matrices* The final concepts we need in this paper are *pattern matrices* and *dominating matrices*.

### Definition 6

A *pattern* of a matrix Open image in new window is an n-by-m binary matrix Open image in new window such that Open image in new window if and only if Open image in new window , and otherwise Open image in new window . We denote the pattern of Open image in new window by Open image in new window .

### Definition 7

Let Open image in new window and Open image in new window be matrices of the same size, and let \(\varGamma \) be a subset of their indices. Then if for all indices \((i, j) \in \varGamma \), Open image in new window , we say that Open image in new window *dominates* Open image in new window *within* \(\varGamma \). If \(\varGamma \) spans the entire size of Open image in new window and Open image in new window , we simply say that Open image in new window *dominates* Open image in new window . Correspondingly, Open image in new window is said to be *dominated by* Open image in new window .

*Main problem definition* Now that we have sufficient notation, we can formally introduce the main problem considered in the paper.

### Problem 1

*Approximate subtropical rank*-

*k*

*matrix factorization*) Given a matrix Open image in new window and an integer \(k>0\), find factor matrices Open image in new window and Open image in new window minimizing

Here we have deliberately not specified any particular norm. Depending on the circumstances, different matrix norms can be used, but in this paper we will consider the two most natural choices—the Frobenius and \(L_1\) norms.

## 3 Theory

Our main contributions in this paper are the algorithms for the subtropical matrix factorization. But before we present them, it is important to understand the theoretical aspects of subtropical factorizations. We will start by studying the computational complexity of Problem 1, showing that it is Open image in new window -hard even to approximate. After that, we will show that the dominated subtropical factorizations of sparse matrices are sparse. Then we compare the subtropical factorizations to factorizations over other algebras, analyzing how the error of an approximate decomposition behaves when moving from tropical to subtropical algebra. Finally, we briefly summarize different ways to define the subtropical rank, and how these different ranks can be used to bound each other, and the Boolean rank of a binary matrix, as well.

### 3.1 Computational complexity

The computational complexity of different matrix factorization problems varies. For example, SVD can be computed in polynomial time (Golub and Van Loan 2012), while NMF is Open image in new window -hard (Vavasis 2009). Unfortunately, the subtropical factorization is also Open image in new window -hard.

### Theorem 1

Computing the max-times matrix rank is an Open image in new window -hard problem, even for binary matrices.

The theorem is a direct consequence of the following theorem by Kim and Roush (2005):

### Theorem 2

(Kim and Roush 2005) Computing the max-plus (tropical) matrix rank is Open image in new window -hard, even for matrices that take values only from \(\{-\infty , 0\}\).

While computing the rank deals with exact decompositions, its hardness automatically makes any approximation algorithm with provable multiplicative guarantees unlikely to exist, as the following corollary shows.

### Corollary 1

It is Open image in new window -hard to approximate Problem 1 to within any polynomially computable factor.

### Proof

Any algorithm that can approximate Problem 1 to within a factor \(\alpha \) must find a decomposition of error \(\alpha \cdot 0 = 0\) if the input matrix has exact max-times rank-*k* decomposition. As this implies solving the max-times rank, per Theorem 1 it is only possible if
Open image in new window
. \(\square \)

### 3.2 Sparsity of the factors

It is often desirable to obtain sparse factor matrices if the original data is sparse, as well, and the sparsity of its factors is frequently mentioned as one of the benefits of using NMF (see, e.g. Hoyer 2004). In general, however, the factors obtained by NMF might not be sparse, but if we restrict ourselves to *dominated* decompositions, Gillis and Glineur (2010) showed that the sparsity of the factors cannot be less than the sparsity of the original matrix.

*sparsity*of an n-by-m matrix Open image in new window , Open image in new window , be defined aswhere Open image in new window is the number of nonzero elements in Open image in new window . Now we have

### Theorem 3

### Proof

*l*estimate (8) holds. \(\square \)

### 3.3 Relation to other algebras

Let us now study how the max-times algebra relates to other algebras, especially the standard, the Boolean, and the max-plus algebras. For the first two, we compare the ranks, and for the last, the reconstruction error.

Let us start by considering the Boolean rank of a binary matrix. The *Boolean (Schein or Barvinok) rank* is the following problem:

### Problem 2

*Boolean rank*) Given a matrix Open image in new window , find the smallest integer

*k*such that there exist matrices Open image in new window and Open image in new window that satisfy Open image in new window , where Open image in new window is the

*Boolean matrix product*,

### Lemma 1

If Open image in new window is a binary matrix, then its Boolean and subtropical ranks are the same.

### Proof

We will prove the claim by first showing that the Boolean rank of a binary matrix is no less than the subtropical rank, and then showing that it is no larger, either. For the first direction, let the Boolean rank of Open image in new window be *k*, and let Open image in new window and Open image in new window be binary matrices such that Open image in new window has *k* columns and Open image in new window . It is easy to see that Open image in new window , and hence, the subtropical rank of Open image in new window is no more than *k*.

*k*and let Open image in new window and Open image in new window be such that Open image in new window . Let (

*i*,

*j*) be such that Open image in new window . By definition, Open image in new window , and henceOn the other hand, if (

*i*,

*j*) is such that Open image in new window , then there exists

*l*such that Open image in new window and consequently,Combining (9) and (10) gives usshowing that the Boolean rank of Open image in new window is at most

*k*. \(\square \)

Notice that Lemma 1 also furnishes us with another proof of Theorem 1, as computing the Boolean rank is Open image in new window -hard (see, e.g. Miettinen 2009). Notice also that while the Boolean rank of the pattern is never more than the subtropical rank of the original matrix, it can be much less. This is easy to see by considering a matrix with no zeroes: it can have arbitrarily large subtropical rank, but it’s pattern has Boolean rank 1.

*n*(the result follows from similar results regarding the Boolean rank, see, e.g. Miettinen 2009).

As we have discussed earlier, max-plus and max-times algebras are isomorphic, and consequently for any matrix Open image in new window its max-times rank agrees with the max-plus rank of the matrix Open image in new window . Yet, the errors obtained in approximate decompositions do not have to (and usually will not) agree. In what follows we characterize the relationship between max-plus and max-times errors. We denote by Open image in new window the extended real line Open image in new window .

### Theorem 4

### Proof

*i*,

*j*we have \((A_{ij} - \alpha _{ij})^2 \le \lambda _{ij}\) and \(\sum _{ij} \lambda _{ij} = \lambda \). By the mean-value theorem, for every

*i*and

*j*we obtainfor some Open image in new window . Hence,The estimate for the max-times error now follows from the monotonicity of the exponent:proving the claim. \(\square \)

### 3.4 Different subtropical matrix ranks

The definition of the subtropical rank we use in this work is the so-called Schein (or Barvinok) rank (see Definition 5). Like in the standard linear algebra, this is not the only possible way to define the (subtropical) rank. Here we will review few other forms of subtropical rank that can allow us to bound the Schein/Barvinok rank of a matrix. Unless otherwise mentioned, the definitions are by Guillon et al. (2015); naturally results without citations are ours. Following Guillon et al, we will present the definitions in this section over the tropical algebra. Recall that due to isomorphism, these definitions transfer directly to the subtropical case.

We begin with the tropical equivalent of the subtropical Schein/Barvinok rank:

### Definition 8

The *tropical Schein/Barvinok rank* of a matrix Open image in new window , denoted Open image in new window , is defined to be the least integer *k* such that there exist matrices Open image in new window and Open image in new window for which Open image in new window .

Analogous to the standard case, we can also define the rank as the number of linearly independent rows or columns. The following definition of linear independence of a family of vectors in a tropical space is due to Gondran and Minoux (1984b).

### Definition 9

*linearly dependent*if there exist disjoint sets \(I, J \subset [k]\) and scalars \(\{\lambda _i\}_{i\in I \cup J}\), such that \(\lambda _i \ne -\infty \) for all

*i*and

*linearly independent*.

This gives rise to the so-called *Gondran–Minoux ranks*:

### Definition 10

The *Gondran–Minoux* row (column) rank of a matrix Open image in new window is defined as the maximal *k* such that Open image in new window has *k* independent rows (columns). They are denoted by Open image in new window and Open image in new window respectively.

Another way to characterize the rank of the matrix is to consider the space its rows or columns can span.

### Definition 11

A set Open image in new window is called *tropically convex* if for any vectors \(\varvec{x}, \varvec{y} \in X\) and scalars Open image in new window , we have \(\max \{\lambda + \varvec{x}, \mu + \varvec{y}\} \in X\).

### Definition 12

*convex hull*\(H(\varvec{x}_1, \dots \varvec{x}_k)\) of a finite set of vectors Open image in new window is defined as follows

### Definition 13

The *weak dimension* of a finitely generated tropically convex subset of Open image in new window is the cardinality of its minimal generating set.

We can define the rank of the matrix by looking at the weak dimension of the (tropically) convex hull its rows or columns span.

### Definition 14

The *row rank* and the *column rank* of a matrix Open image in new window are defined as the weak dimensions of the convex hulls of the rows and the columns of Open image in new window respectively. They are denoted by Open image in new window and Open image in new window .

None of the above definitions coincide (see Akian et al. 2009), unlike in the standard algebra. We can, however, have a partial ordering of the ranks:

### Theorem 5

The row and column ranks of an n-by-n tropical matrix can be computed in \(O(n^3)\) time (Butkovič 2010), allowing us to bound the Schein/Barvinok rank from above. Unfortunately, no efficient algorithm for the Gondran–Minoux rank is known. On the other hand, Guillon et al. (2015) presented what they called the *ultimate tropical rank* that lower-bounds the Gondran–Minoux rank and can be computed in time \(O(n^3)\). We can also check if a matrix has full Schein/Barvinok rank in time \(O(n^3)\) (see Butkovič and Hevery 1985), even if computing any other value is
Open image in new window
-hard.

These bounds, together with Lemma 1 yield the following corollary regarding the bounding of the *Boolean rank* of a square matrix:

### Corollary 2

Given an n-by-n binary matrix Open image in new window , it’s Boolean rank can be bound from below, using the ultimate rank, and from above, using the tropical column and row ranks, in time \(O(n^3)\).

## 4 Algorithms

There are some unique challenges in doing subtropical matrix factorization, that stem from the lack of linearity and smoothness of the max-times algebra. One of such issues is that dominated elements in a decomposition have no impact on the final result. Namely, if we consider the subtropical product of two matrices Open image in new window and Open image in new window , we can see that each entry Open image in new window is completely determined by a single element with index Open image in new window . This means that all entries *t* with Open image in new window do not contribute at all to the final decomposition. To see why this is a problem, observe that many optimization methods used in matrix factorization algorithms rely on local information to choose the direction of the next step (e.g. various forms of gradient descent). In the case of the subtropical algebra, however, the local information is practically absent, and hence we need to look elsewhere for effective optimization techniques.

A common approach to matrix decomposition problems is to update factor matrices alternatingly, which utilizes the fact that the problem Open image in new window is biconvex. Unfortunately, the subtropical matrix factorization problem does not have the biconvexity property, which makes alternating updates less useful.

Here we present a different approach that, instead of doing alternating factor updates, constructs the decomposition by adding one rank-1 matrix at a time, following the idea by Kolda and O’Leary (2000). The corresponding algorithm is called Open image in new window (Algorithm 1).

*k*subproblems of the following form: given a rank-\((l-1)\) decomposition Open image in new window , Open image in new window of a matrix Open image in new window , find a column vector Open image in new window and a row vector Open image in new window such that the erroris minimized. We assume by definition that the rank-0 decomposition is an all zero matrix of the same size as Open image in new window . The problem of rank-

*k*subtropical matrix factorization is then reduced to solving (17)

*k*times. One should of course remember that this scheme is just a heuristic and finding optimal blocks on each iteration does not guarantee converging to a global minimum.

*k*decomposition. This is because for smaller ranks we generally have to cover the data more crudely, whereas when the rank increases we can afford to use smaller and more refined blocks. In order to deal with this problem, we find and then update the blocks repeatedly, in a cyclic fashion. That means that after discovering the last block, we go all the way back to block one. The input parameter

*M*defines the number of full cycles we make.

On a high level Open image in new window works as follows. First the factor matrices are initialized to all zeros (line 2). Since the algorithm makes iterative changes to the current solutions that might in some cases lead to worsening of the results, it also stores the best reconstruction error and the corresponding factors found so far. They are initialized with the starting solution on lines 3–4. The main work is done in the loop on lines 5–10, where on each iteration we update a single rank-1 matrix in the current decomposition using the Open image in new window routine (line 7), and then check if the update improves the best result (lines 8–10).

We will present two versions of the Open image in new window function, one called Open image in new window and the other one Open image in new window . Open image in new window is designed to work with discrete (or flipping) noise, when some of the elements in the data are randomly changed to different values. In this setting the level of noise is the proportion of the flipped elements relative to the total number of nonzeros. Open image in new window on the other hand is robust with continuous noise, when many elements are affected (e.g. Gaussian noise). We will discuss both of them in detail in the following subsections. In the rest of the paper, especially when presenting the experiments, we will use names Open image in new window and Open image in new window not only for a specific variation of the Open image in new window function, but also for the Open image in new window algorithm that uses it.

### 4.1 Open image in new window

We first describe
Open image in new window
, which is designed to solve the subtropical matrix factorization problem in the presence of discrete noise, and minimizes the \(L_{1}\) norm of the error matrix. The main idea behind the algorithm is to spot potential blocks by considering ratios of matrix rows. Consider an arbitrary rank-1 block Open image in new window , where Open image in new window and Open image in new window . For any indices *i* and *j* such that \(\varvec{b}_i>0\) and \(\varvec{b}_j>0\), we have Open image in new window . This is a characteristic property of rank-1 matrices—all rows are multiples of one another. Hence, if a block Open image in new window dominates some region \(\varGamma \) of a matrix Open image in new window , then rows of Open image in new window should all be multiples of each other within \(\varGamma \). These rows might have different lengths due to block overlap, in which case the rule only applies to their common part.

*NaN*, indicating that this value is already covered. We then select a seed row (line 4), with an intention of growing a block around it. We choose the row with the largest sum as this increases the chances of finding the most prominent block. In order to find the best block Open image in new window that the seed row passes through, we first find a binary matrix Open image in new window that represents the pattern of Open image in new window (line 5). Next, on lines 6–9 we choose an approximation of the block pattern with index sets \({b\_idx}\) and \(c\_idx\), which define what elements of \(\varvec{b}\) and \(\varvec{c}\) should be nonzero. The next step is to find the actual values of elements within the block with the function Open image in new window (line 10). Finally, we inflate the found core block with Open image in new window (line 11).

At this point we know the pattern of the new block, that is, the locations of its non-zeros. To fill in the actual values, we consider the submatrix defined by the pattern, and find the best rank-1 approximation of it. We do this using the Open image in new window function (Algorithm 5). It begins by setting all elements outside of the pattern to 0 as they are irrelevant to the block (line 2). Then it chooses one row to represent the block (lines 3–4), which will be used to find a good rank-1 cover.

**Parameters** Open image in new window
has four parameters in addition to the common parameters in the Equator framework: Open image in new window , \(\delta >0\), \(\theta >0\), and \(\tau \in [0,1]\). The first one, Open image in new window determines the minimum number of elements in two rows that must have “approximately” the same ratio for them to be considered for building a block. The parameter \(\delta \) defines the bucket width when computing row correlations. When expanding a block, \(\theta \) is used to decide whether to add a row (or column) to it—the decision is positive whenever the expression (19) is at most \(\theta \). Finally \(\tau \) is used during the discovery of correlated rows. The value of \(\tau \) belongs to the closed unit interval, and the higher it is, the more rows will be added.

### 4.2 Open image in new window

We now present our second algorithm, Open image in new window , which is a counterpart of Open image in new window specifically designed to work in the presence of high levels of continuous noise. The reason why Open image in new window cannot deal with continuous noise is that it expects the rows in a block to have an “almost” constant elementwise ratio, which is not the case when too many entries in the data are disturbed. For example, even low levels of Gaussian noise would make the ratios vary enough to hinder Open image in new window ’s ability to spot blocks. With Open image in new window we take a new approach which is based on polynomial approximation of the objective. We also replace the \(L_1\) matrix norm, which was used as an objective for Open image in new window , with the Frobenius norm. The reason for that is that when the noise is continuous, its level is defined as the total deviation of the noisy data from the original, rather than a count of the altered elements. This makes the Frobenius norm a good estimator for the amount of noise. Open image in new window conforms to the general framework of Open image in new window (Algorithm 1), and differs from Open image in new window only in how it finds the blocks and in the objective function.

Observe that in order to solve the problem (17) we need to find a column vector Open image in new window and a row vector Open image in new window such that they provide the best rank-1 approximation of the input matrix given the current factorization. The objective function is not convex in either \(\varvec{b}\) or \(\varvec{c}\) and is generally hard to optimize directly, so we have to simplify the problem, which we do in two steps. First, instead of doing full optimization of \(\varvec{b}\) and \(\varvec{c}\) simultaneously, we update only a single element of one of them at a time. This way the problem is reduced to single variable optimization. Even then the objective is hard to minimize, and we replace it with a polynomial approximation, which is easy to optimize directly.

The Open image in new window version of the Open image in new window function is described in Algorithm 7. It alternatingly updates the vectors \(\varvec{b}\) and \(\varvec{c}\) using the Open image in new window routine. Both \(\varvec{b}\) and \(\varvec{c}\) will be updated \(\lfloor f (n+m)/2\rfloor \) times. Open image in new window starts by finding the index of the block that has to be changed (line 2). Since the purpose of Open image in new window is to find the best rank-1 matrix to replace the current block, we also need to compute the reconstructed matrix without it, which is done on line 3. We then find the number of times Open image in new window will be called (line 4) and change the degree of polynomials used for objective function approximation (line 5). This is needed because high degree polynomials are better at finalizing a solution that is already reasonably good, but tend to overfit the data and cause the algorithm to get stuck in local minima at the beginning. It is therefore beneficial to start with polynomials of lower degrees and then gradually increase it. The actual changes to \(\varvec{b}\) and \(\varvec{c}\) happen in the loop (lines 7–9), where we update them using Open image in new window .

*m*entries and then choose the one that yields the most improvement to the objective. A single element \(\varvec{c}_l\) only has an effect on the error along the column

*l*. Assume that we are currently updating block with index

*q*and let Open image in new window denote the reconstruction matrix without this block, that is Open image in new window . Minimizing Open image in new window with respect to \(\varvec{c}_l\) is then equivalent to minimizingInstead of minimizing (20) directly, we use polynomial approximation in the Open image in new window routine (line 4). It returns the (approximate) error \({\textit{err}}\) and the value

*x*achieving that. The polynomial approximation is obtained by evaluating the objective function at \(deg+1\) points generated uniformly at random from the interval [0, 5] and then fitting a polynomial to the obtained values. The upper bound of 5 does not have any special meaning, rather it was chosen by trial and error. Open image in new window is a heuristic and does not necessarily find the global minimum of the objective function. Moreover, in rare cases it might even cause an increase in the objective value. In such cases it would, in theory, make sense to just keep the value prior to the update, as in that case the objective at least does not increase. However in practice this phenomenon helps to get out of local minima. Since we are only interested in the improvement of the objective achieved by updating a single entry of \(\varvec{c}\), we compute the improvement of the objective after the change (line 5). After trying every column of \(\varvec{c}\), we update only the column that yield the largest improvement.

The function \(\gamma \) that we need to minimize in order to find the best change to the vector \(\varvec{c}\) in
Open image in new window
is hard to work with directly since it is not convex, and also not smooth because of the presence of the maximum operator. To alleviate this, we approximate the error function \(\gamma \) with a polynomial *g* of degree *deg*. Notice that when updating \(\varvec{c}_l\), other variables of \(\gamma \) are fixed and we only need to consider function Open image in new window . To build *g* we sample \(deg+1\) points from (0, 1) and fit *g* to the values of \(\gamma '\) at these points. We then find the Open image in new window that minimizes *g*(*x*) and return *g*(*x*) (the approximate error) and *x* (the optimal value).

**Parameters** Open image in new window
has two parameters, \(t>2\) and \(0<f<1\), that control its execution. The first one, *t*, is the maximum allowed degree of polynomials used for approximation of the objective, which we set to 16 in all our experiments. The second parameter, *f*, determines the number of single element updates we make to the row and column vectors of a block in
Open image in new window
. To demonstrate that the chosen values of the parameters are reasonable, we performed a grid search for various parameter values (see Fig. 4 in Sect. 5).

**Generalized** Open image in new window
The
Open image in new window
algorithm can be adapted to optimize other objective functions. Its general polynomial approximation framework allows for a wide variety of possible objectives, the only constraint being that they have to be additive (we call a function Open image in new window *additive* if there exists a mapping Open image in new window such that for all Open image in new window and Open image in new window we have Open image in new window ). Some examples of such functions are \(L_1\) and Frobenius matrix norms, as well as Kullback–Leibler and Jensen–Shannon divergences. In order to use the generalized form of
Open image in new window
one simply has to replace the Frobenius norm with another cost function wherever the error is evaluated.

### 4.3 Time complexity

The main work in
Open image in new window
is performed inside the
Open image in new window
routine, which is called *Mk* times. Since *M* is a constant parameter, the complexity of
Open image in new window
is *k* times the complexity of
Open image in new window
. In the following we find the theoretical bounds on the execution time of
Open image in new window
for both
Open image in new window
and
Open image in new window
.

Open image in new window
In the case of
Open image in new window
there are three main contributors to
Open image in new window
(Algorithm 2):
Open image in new window
,
Open image in new window
, and
Open image in new window
.
Open image in new window
compares every row to the seed row, each time calling
Open image in new window
, which in turn has to process all *m* elements of both rows. This results in the total complexity of
Open image in new window
being *O*(*nm*). To find the complexity of
Open image in new window
, first observe that any “pure” block Open image in new window can be represented as Open image in new window , where Open image in new window and Open image in new window with \(n'\le n\) and \(m'\le m\).
Open image in new window
selects \(\varvec{c}\) from the rows of Open image in new window and then finds the corresponding column vector \(\varvec{b}\) that minimizes Open image in new window . In order to select the best row, we have to try each of the \(n'\) candidates, and since finding the corresponding \(\varvec{b}\) for each of them takes time \(O(n'm')\), this gives the runtime of
Open image in new window
as \(O(n')O(n'm') = O(n^2m)\). The most computationally expensive parts of
Open image in new window
are
Open image in new window
(line 4), finding the mean (line 7), and computing the impact (line 8), which all run in *O*(*m*) time. All of these operations have to be repeated *O*(*n*) times, and hence the runtime of
Open image in new window
is *O*(*nm*). Thus, we can now estimate the complexity of
Open image in new window
to be \(O(nm)+O(n^2m)+O(nm) = O(n^2m)\), which leads to the total runtime of
Open image in new window
to be \(O(n^2mk)\).

**Cancer** Here
Open image in new window
(Algorithm 7) is a loop that calls
Open image in new window \(\lfloor f(n+m) \rfloor \) times. In
Open image in new window
the contributors to the complexity are computing the base error (line 3) and a call to
Open image in new window
(line 4). Both of them are performed *n* or *m* times depending on whether we supplied the column vector \(\varvec{b}\) or the row vector \(\varvec{c}\) to
Open image in new window
. Finding the base error takes time *O*(*m*) for \(\varvec{b}\) and *O*(*n*) for \(\varvec{c}\). The complexity of
Open image in new window
boils down to that of evaluating the max-times objective at \(deg+1\) points and then minimizing a degree *deg* polynomial. Hence,
Open image in new window
runs in time *O*(*m*) or *O*(*n*) depending on whether we are optimizing \(\varvec{b}\) or \(\varvec{c}\), and the complexity of
Open image in new window
is *O*(*nm*).

Since
Open image in new window
is called \(\lfloor f(n+m)/2 \rfloor \) times and *f* is a fixed parameter, this gives the complexity \(O\bigl ((n+m)nm\bigr )\) for
Open image in new window
and \(O\bigl ((n+m)nmk\bigr ) = O(\max \{n,m\}nmk)\) for
Open image in new window
.

Empirical evaluation of the time complexity is reported in Sect. 5.3.

## 5 Experiments

We tested both
Open image in new window
and
Open image in new window
on synthetic and real-world data. In addition we also compare against a variation of
Open image in new window
that optimizes the Jensen–Shannon divergence, which we call
Open image in new window
. The purpose of the synthetic experiments is to evaluate the properties of the algorithm in controlled environments where we know the data has the max-times structure. They also demonstrate on what kind of data each algorithm excels and what their limitations are. The purpose of the real-world experiments is to confirm that these observations also hold true in real-world data, and to study what kinds of data sets actually have max-times structure. The source code of
Open image in new window
and
Open image in new window
and the scripts that run the experiments in this paper are freely available for academic use.^{3}

*Parameters of* Open image in new window
. Both variations of
Open image in new window
use the same set of parameters. For the synthetic experiments we used \(M=14\), \(t=16\), and \(f=0.1\). For the real world experiments we set \(t=16\), \(f=0.1\), and \(M=40\) (except for
Open image in new window
, where we used \(M=50\) and
Open image in new window
, where we set \(M=8\)). Increasing *M*, which controls the number of cycles of execution of
Open image in new window
, almost invariably improves the results. At some point though, the gains become marginal, and the value of \(M=40\) is chosen so as to reach the point where increasing *M* further would not yield much improvement. Sometimes though, this moment can be reached faster—for example the smaller choice of *M* for
Open image in new window
is motivated by the fact that
Open image in new window
quickly reached a point where it could no longer make significant progress, despite
Open image in new window
being the largest dataset. The relationship of the other two parameters and the quality of decomposition is more complex. We see in Fig. 4a that the dependence on *f* and *t* is not monotone, and it is hard to pinpoint the best combination exactly. Moreover, the optimal values can differ depending on the dataset; for example, Fig. 4b features an almost monotone dependence on *f* that flattens out before *f* reaches 0.1. From our experience, however, the values of \(t=16\) and \(f=0.1\) seem to be a good choice.

*Parameters of*Open image in new window . In both synthetic and real-world experiments we used the following default set of parameters: \(M=4\), Open image in new window , \(\delta =0.01\), \(\theta =0.5\), and \(\tau =0.5\). As with Open image in new window , there is a complex dependency of the results on the parameters, but the values chosen above seem to produce good results in most cases. We do not show a comparison table, as we did with Open image in new window , due to a bigger number of parameters.

### 5.1 Other methods

^{4}which we call Open image in new window . It defines the sparsity of a vector Open image in new window asand returns factorizations where the sparsity of the factor matrices is user-controllable. Note that the above definition of sparsity is different from the one we use elsewhere (see Equation (4)). In order to run Open image in new window we used the sparsity of Open image in new window ’s factors (as defined by (21)) as its sparsity parameter. We also compare against a standard alternating least squares algorithm called Open image in new window (Cichocki et al. 2009). Next we have two versions of NMF that are essentially the same as Open image in new window , but they use \(L_1\) regularization for increased sparsity (Cichocki et al. 2009), that is, they aim at minimizingThe first method is called Open image in new window and uses regularizer coefficient \(\alpha =\beta =1\), and the other, called Open image in new window , has regularizer coefficient \(\alpha =\beta =5\). It is natural to ask how Open image in new window would fare with different values of parameters. In Fig. 5 we perform a grid search for the best parameter combination. While the experiment with Open image in new window has a very uneven surface without much structure apart from a couple of spikes, the synthetic dataset demonstrates that high values of \(\alpha \) and \(\beta \) can have serious adverse effects on the reconstruction error. It therefore seems safest to set \(\alpha =\beta =0\), which corresponds to the Open image in new window method. It is worth mentioning that in many of our experiments larger values of \(\alpha \) and \(\beta \) resulted in factors becoming close to zero, or some elements in the factors getting enormous values due to numeric instability. This was the case for some other real-world experiments, such as Open image in new window , which is another indication to use the parameter values of \(\alpha =\beta =0\).

The last NMF algorithm, Open image in new window by Li and Ngom (2013), is designed to work with missing values in the data.

### 5.2 Synthetic experiments

The purpose of synthetic experiments is to prove the concept, that is that our algorithms are capable of identifying the max-times structure when it is there. In order to test this, we first generate the data with the pure max-times structure, then pollute it with some level of noise, and finally run the methods. The noise-free data is created by first generating random factors of some density with nonzero elements drawn from a uniform distribution on the [0, 1] interval and then multiplying them using the max-times matrix product.

We distinguish two types of noise. The first one is the discrete (or tropical) noise, which is introduced in the following way. Assume that we are given an input matrix Open image in new window of size n-by-m. We first generate an n-by-m noise matrix Open image in new window with elements drawn from a uniform distribution on the [0, 1] interval. Given a level of noise *l*, we then turn \(\lfloor (1 - l)nm \rfloor \) random elements of Open image in new window to 0, so that its resulting density is *l*. Finally, the noise is applied by taking elementwise maximum between the original data and the noise matrix Open image in new window . This is the kind of noise that
Open image in new window
was designed to handle, so we expect it to be better than
Open image in new window
and other comparison algorithms.

We also test against continuous noise, as it is arguably more common in the real world. For that we chose Gaussian noise with 0 mean, where the noise level is defined to be its standard deviation. Since adding this noise to the data might result in negative entries, we truncate all values in a resulting matrix that are below zero.

Unless specified otherwise, all matrices in the synthetic experiments are of size 1000-by-800 with true max-times rank 10. All results presented in this section are averaged over 10 instances. For reconstruction error tests, we compared our algorithms
Open image in new window
,
Open image in new window
, and
Open image in new window
against
Open image in new window
,
Open image in new window
,
Open image in new window
,
Open image in new window
,
Open image in new window
, and
Open image in new window
. The error is measured as the relative Frobenius norm Open image in new window , where Open image in new window is the data and Open image in new window its approximation, as that is the measure both
Open image in new window
and
Open image in new window
aim at minimizing. We also report the sparsity *s* of factor matrices obtained by algorithms, which is defined as a fraction of zero elements in the factor matrices, see (4). for an n-by-m matrix Open image in new window . For the experiments with tropical noise, the reconstruction errors are reported in Fig. 6 and factor sparsity in Fig. 7. For the Gaussian noise experiments, the reconstruction errors and factor sparsity are shown in Figs. 8 and 9, respectively.

*Varying density with tropical noise*In our first experiment we studied the effects of varying the density of the factor matrices in presence of the tropical noise. We changed the density of the factors from 10 to 100% with an increment of 10%, while keeping the noise level at 10%. Figure 6a shows the reconstruction error and Fig. 7a the sparsity of the obtained factors. Open image in new window is consistently the best method, obtaining almost perfect reconstruction; only when the density approaches 100% does its reconstruction error deviate slightly from 0. This is expected since the data was generated with the tropical (flipping) noise that Open image in new window is designed to optimize. Compared to Open image in new window all other methods clearly underperform, with Open image in new window being the second best. With the exception of Open image in new window , all NMF methods obtain results similar to those of Open image in new window , while having a somewhat higher reconstruction error than Open image in new window . That Open image in new window and NMF methods (except Open image in new window ) start behaving better at higher levels of density indicates that these matrices can be explained relatively well using standard algebra. Open image in new window and Open image in new window also have the highest sparsity of factors, with Open image in new window exhibiting a decrease in sparsity as the density of the input increases. This behaviour is desirable since ideally we would prefer to find factors that are as close to the original ones as possible. For NMF methods there is a trade-off between the reconstruction error and the sparsity of the factors—the algorithms that were worse at reconstruction tend to have sparser factors.

*Varying tropical noise*The amount of noise is always with respect to the number of nonzero elements in a matrix, that is, for a matrix Open image in new window with Open image in new window nonzero elements and noise level \(\alpha \), we flip Open image in new window elements to random values. There are two versions of this experiment—one with factor density 30% and the other with 60%. In both cases we varied the noise level from 0 to 110% with increments of 10%. Figure 6b, c shows the respective reconstruction errors and Fig. 7b, c the corresponding sparsities of the obtained factors. In the low-density case, Open image in new window is consistently the best method with essentially perfect reconstruction for up to 80% of noise. In the high-density case, however, the noise has more severe effects, and in particular after 60% of noise, Open image in new window , Open image in new window , and all versions of NMF are better than Open image in new window . The severity of the noise is, at least partially, explained by the fact that in the denser data we flip more elements than in sparser data: for example when the data matrices are full, at 50% of noise, we have already replaced half of the values in the matrices with random values. Further, the quick increase of the reconstruction error for Open image in new window hints strongly that the max-times structure of the data is mostly gone at these noise levels. Open image in new window also produces clearly the sparsest factors for the low density case, and is mostly tied with Open image in new window and Open image in new window when the density is high. It should be noted, however, that Open image in new window generally has the highest reconstruction error among all the methods, which suggests that its sparse factors come at the cost of recovering little structure from the data.

*Varying rank with tropical noise* Here we test the effects of the (max-times) rank, with the assumption that higher-rank matrices are harder to reconstruct. The true max-times rank of the data varied from 2 to 20 with increments of 2. There are three variations of this experiment: with 30% factor density and 10% noise (Fig. 6d), with 30% factor density and 50% noise (Fig. 6e), and with 60% factor density and 10% noise (Fig. 6f). The corresponding sparsities are shown on Fig. 7d–f.
Open image in new window
has a clear advantage for all settings, obtaining nearly perfect reconstruction.
Open image in new window
is generally second best, except for the high noise case, where it is mostly tied with a bunch of NMF methods. It also has a relatively high variance. To see why this happens, consider that
Open image in new window
always updates one element in factor matrices at a time. This update is completely dependent on values on a single row (or column) and is sensitive to the spikes that tropical noise introduces to some elements. Interestingly, on the last two plots the reconstruction error actually drops for
Open image in new window
,
Open image in new window
, and NMF-based methods. This is a strong indication that at this point they no longer can extract meaningful structure in the data, and the improvement of the reconstruction error is largely due to uniformization of the data caused by high density and high noise levels.

*Varying Gaussian noise*Here we investigate how the algorithms respond to different levels of Gaussian noise, which was varied from 0 to 0.14 with increments of 0.01. A level of noise is a standard deviation of the Gaussian noise used to generate the noise matrix as described earlier. The factor density was kept at 50%. The results are given on Figs. 8a (reconstruction error) and 9a (sparsity of factors).

Here Open image in new window is generally the best method in reconstruction error, and second in sparsity only to Open image in new window . The only time it loses to any method is when there is no noise, and Open image in new window obtains a perfect decomposition. This is expected since Open image in new window is by design better at spotting pure subtropical structure.

*Varying density with Gaussian noise* In this experiment we studied what effects the density of factor matrices used in data generation has on the algorithms’ performance. For this purpose we varied the density from 10 to 100% with increments of 10% while keeping the other parameters fixed. There are two versions of this experiment, one with low noise level of 0.01 (Figs. 8b, 9b), and a more noisy case at 0.08 (Figs. 8c, 9c).

*Varying rank with Gaussian noise* The purpose of this test is to study the performance of algorithms on data of different max-times ranks. We varied the true rank of the data from 2 to 20 with increments of 2. The factor density was fixed at 50% and Gaussian noise at 0.01. The results are shown on Figs. 8d (reconstruction error) and 9d (sparsity of factors). The results are similar to those considered above, with
Open image in new window
returning the most accurate and second sparsest factorizations.

*Optimizing the Jensen–Shannon divergence*By default Open image in new window optimizes the Frobenius reconstruction error, but it can be replaced by an arbitrary additive cost function. We performed experiments with Jensen–Shannon divergence, which is given by the formulaIt is easy to see that (22) is an additive function, and hence can be plugged into Open image in new window . Figure 10 shows how this version of Open image in new window compares to other methods. The setup is the same as in the corresponding experiments on Fig. 8. In all these experiments it is apparent that this version of Open image in new window is inferior to that optimizing the Frobenius error, but is generally on par with Open image in new window and NMF-based methods. Also for the varying density test (Fig. 10b) it produces better reconstruction errors than Open image in new window and all the NMF methods, until the density reaches 50%, after which they become tied.

*Prediction*In this experiment we choose a random holdout set and remove it from the data (elements of this set are marked as missing values). We then try to learn the structure of the data from its remaining part using the algorithms, and finally test how well they predict the values inside the holdout set. The factors are drawn uniformly at random from the set of integers in an interval [0,

*a*] with a predefined density of 30%, and then multiplied using the subtropical matrix product. We use two different values of

*a*for each experiment, 10 and 3. With \(a=10\) input matrices have values in the range [0, 100], and when \(a=3\), the range is [0, 9]. We then apply noise to the obtained matrices and feed them to the algorithms. Since all input matrices are integer-valued, and since the recovered data produced by the algorithms can be continuous-valued, we round it to the nearest integer. We report two measures of the prediction quality—prediction rate, which is defined as the fraction of correctly guessed values in the hold-out set, and root mean square error (RMSE). We tested this setup with both tropical noise (Fig. 11) and Gaussian noise (Fig. 12).

Open image in new window gives by far the best prediction rate when using the higher [0, 100] range of values in input matrices (Figs. 11a, 12a). Especially interesting is that it also beats all other methods in the presence of Gaussian noise. In terms of RMSE it generally lands somewhere in the middle of the pack among various NMF methods. Such a large difference between these measures is caused by Open image in new window not really being an approximation algorithm. It extracts subtropical patterns where they exist, while ignoring parts of the data where they cannot be found. This results in it either predicting the integer values exactly or missing by a wide margin. With the [0, 9] range of values the results of Open image in new window become worse, which is especially evident with Gaussian noise. Although this behaviour might seem counterintuitive, it is simply a consequence of noise having a larger effect when values in the data are smaller. Open image in new window shows the opposite behaviour to Open image in new window in that it benefits from smaller value range and Gaussian noise, where it consistently outperforms all other methods. Unlike Open image in new window , Open image in new window approximates values in input data, which allows it to get a high number of hits with the [0, 9] range after the rounding. On the [0, 100] interval though, it is liable to guessing many values incorrectly since a much higher level of precision is required. For many prediction tasks, like predicting user ratings, Open image in new window ’s approach seems more useful as input values are usually drawn from a relatively small range (for example, in Open image in new window , all ratings are from [0, 5]). Other competing methods generally do not perform well, with the exception of Open image in new window winning the first place with RMSE measure for the high range experiments (Figs. 11b, 12b). It illustrates once again that Open image in new window is a good approximation method but does not help its prediction accuracy. In all other experiments the first place is held by either Open image in new window or Open image in new window . As a general guideline, when choosing between Open image in new window and Open image in new window for value prediction, one should consider that Open image in new window usually gives a superior performance, while Open image in new window tends to be better for exact guessing of values having a wider range.

*Discussion* The synthetic experiments confirm that both
Open image in new window
and
Open image in new window
are able to recover matrices with max-times structure. The main practical difference between them is that
Open image in new window
is designed to handle the tropical (flipping) noise, while
Open image in new window
is meant for the data that is perturbed with white (Gaussian) noise. While
Open image in new window
is clearly the best method when the data has only the flipping noise—and is capable of tolerating very high noise levels—its results deteriorate when we apply Gaussian noise. Hence, when the exact type of noise is not known a priori, it is advisable to try both methods. It is also important to note that
Open image in new window
is actually a framework of algorithms as it can optimize various objectives. In order to demonstrate that, we performed experiments with Jensen–Shannon divergence as objective and obtained results that are, while inferior to
Open image in new window
that optimizes the Frobenius error, still slightly better than the rest of the algorithms. Overall we can conclude that
Open image in new window
and the NMF-based methods generally cannot recover the structure from subtropical data, that is, we cannot use existing methods as a substitute to find the max-times structure neither for the reconstruction nor for the prediction tasks.

### 5.3 Real-world experiments

The main purpose of the real-world experiments is to study to which extend Open image in new window and Open image in new window can find max-times structure from various real-world data sets. Having established with the synthetic experiments that both algorithms are capable of finding the structure when it is present, here we look at what kind of results they obtain in the real-world data.

It is probably unrealistic to expect real-world data sets to have “pure” max-times structure, as in the synthetic experiments. Rather, we expect Open image in new window to be the best method (in reconstruction error’s sense), and our algorithms to obtain reconstruction error comparable to the NMF-based methods. We will also verify that the results from the real-world data sets are intuitive.

*The datasets*

Open image in new window represents a linear program.^{5} It is available from the University of Florida Sparse Matrix Collection^{6} (Davis and Hu 2011).

Open image in new window is a brute force disjoint product matrix in tree algebra on *n* nodes.^{7} It can be obtained from the same repository as
Open image in new window
.

Open image in new window contains weather records for various locations in Europe (full description can be found in Sect. 1).

Open image in new window is a nerdiness personality test that uses different attributes to determine the level of nerdiness of a person.^{8} It contains answers by 1418 respondents to a set of 36 questions that asked them to self-assess various statements about themselves on a scale of 1 to 7. We preprocessed the input matrix by dividing each column by its standard deviation and subtracting its mean. To make sure that the data is nonnegative, we subtracted the smallest value of the obtained normalized matrix from every its element.

Open image in new window is a subset of the Extended Yale Face collection of face images (Georghiades et al. 2000). It consists of 32-by-32 pixel images under different lighting conditions. We used a preprocessed data by Xiaofei He et al.^{9} We selected a subset of pictures with lighting from the left and then preprocessed the input matrix by first subtracting from every column its smallest element and then dividing it by its standard deviation.

Open image in new window is a subset of the 20Newsgroups dataset,^{10} containing the usage of 800 words over 400 posts for 4 newsgroups.^{11} Before running the algorithms we represented the dataset as a TF-IDF matrix, and then scaled it by dividing each entry by the greatest entry in the matrix.

Open image in new window is a land registry house price index.^{12} Rows represent months, columns are locations, and entries are residential property price indices. We preprocessed the data by first dividing each column by its standard deviation and then subtracting its minimum, so that each column has minimum 0.

Open image in new window is a collection of user ratings for a set of movies. The original dataset^{13} consists of 100 000 ratings from 1000 users on 1700 movies, with ratings ranging from 1 to 5. In order to be able to perform cross-validation on it, we had to preprocess
Open image in new window
by removing users that rated fewer than 10 movies and movies that were rated less than 5 times. After that we were left with 943 users, 1349 movies and 99 287 ratings.

Real world datasets properties

Dataset | Rows | Columns | Density (%) |
---|---|---|---|

9825 | 5411 | 1.1 | |

2726 | 551 | 10.0 | |

2575 | 48 | 99.9 | |

1418 | 36 | 99.6 | |

1024 | 222 | 97.0 | |

400 | 800 | 3.5 | |

253 | 177 | 99.5 | |

943 | 1349 | 7.8 |

#### 5.3.1 Quantitative results: reconstruction error, sparsity, convergence, and runtime

^{14}Since there is no ground truth for these datasets, the ranks are chosen based mainly on the size of the data and our intuition on what the true rank should be. Open image in new window is, as expected, consistently the best method, followed by Open image in new window and Open image in new window . Open image in new window generally lands in the middle of the pack of the NMF methods, which suggests that it is capable of finding max-times structure that is comparable to what NMF-based methods provide. Consequently, we can study the max-times structure found by Open image in new window , knowing that it is (relatively) accurate. On the other hand, Open image in new window has a high reconstruction error. The discrepancy between Open image in new window ’s and Open image in new window ’s results indicates that the datasets used cannot be represented using “pure” subtropical structure. Rather, they are either a mix of NMF and subtropical patterns or have relatively high levels of continuous noise.

Reconstruction error for various real-world datasets

\(k=\) | 10 | 10 | 40 | 20 | 15 | 10 | 25 | 25 |
---|---|---|---|---|---|---|---|---|

0.071 | 0.240 | 0.204 | 0.556 | 0.027 | 0.756 | 0.864 | 0.813 | |

0.392 | 0.395 | 0.972 | 0.987 | 0.217 | 1.003 | 0.998 | 0.912 | |

0.046 | 0.225 | 0.178 | 0.546 | 0.023 | 0.745 | 0.841 | 0.749 | |

0.087 | 0.227 | 0.313 | 0.538 | 0.074 | 0.749 | 0.828 | 0.733 | |

0.122 | 0.226 | 0.294 | 1.000 | 0.045 | 0.748 | 0.827 | 0.733 | |

0.081 | 0.233 | 0.291 | 1.000 | 0.063 | 0.748 | 0.826 | 0.733 | |

0.034 | 0.221 | 0.169 | 0.545 | 0.021 | 0.741 | 0.824 | 0.733 | |

0.025 | 0.209 | 0.140 | 0.533 | 0.015 | 0.728 | 0.802 | 0.722 |

Factor sparsity for various real-world datasets

\(k=\) | 10 | 10 | 40 | 20 | 15 | 10 | 25 | 25 |
---|---|---|---|---|---|---|---|---|

0.645 | 0.528 | 0.571 | 0.812 | 0.422 | 0.666 | 0.838 | 0.951 | |

0.795 | 0.733 | 0.949 | 0.991 | 0.685 | 0.957 | 0.988 | 0.978 | |

0.383 | 0.330 | 0.403 | 0.499 | 0.226 | 0.543 | 0.758 | 0.738 | |

0.226 | 0.120 | 0.434 | 0.513 | 0.331 | 0.420 | 0.573 | 0.634 | |

0.275 | 0.117 | 0.480 | 1.000 | 0.729 | 0.438 | 0.681 | 0.748 | |

0.549 | 0.189 | 0.648 | 1.000 | 0.622 | 0.481 | 0.743 | 0.811 |

The average runtime in seconds and standard deviation of the algorithms for various real-world datasets

\(20116.000 \pm 15.14\) | \(6023.000 \pm 25.00\) | \(25520.000 \pm 60.44\) | \(924.000 \pm 8.00\) | |

\(205.870 \pm 1.39\) | \(87.000 \pm 1.30\) | \(165.960 \pm 7.12\) | \(41.000 \pm 0.72\) | |

\(115.100 \pm 0.53\) | \(72.000 \pm 1.50\) | \(195.570 \pm 1.76\) | \(64.000 \pm 0.51\) | |

\(0.194 \pm 0.08\) | \(0.374 \pm 0.14\) | \(3.649 \pm 1.45\) | \(0.156 \pm 0.12\) | |

\(\mathbf{0.187 } \pm 0.02\) | \(0.280 \pm 0.04\) | \(4.684 \pm 0.74\) | \(0.309 \pm 0.13\) | |

\(2.240 \pm 0.20\) | \(1.201 \pm 0.11\) | \(5.164 \pm 0.87\) | \(1.288 \pm 0.10\) | |

\(0.598 \pm 0.04\) | \(\mathbf{0.155 } \pm 0.04\) | \(\mathbf{0.142 } \pm 0.04\) | \(\mathbf{0.027 } \pm 0.01\) |

*Prediction*

Here we investigate how well both Open image in new window and Open image in new window can predict missing values in the data. We used three real-world datasets, a user-movie rating matrix Open image in new window , a brute force disjoint product matrix in tree algebra Open image in new window and Open image in new window , that represents a linear program. All these matrices are integer valued, and hence we will also round the results of all methods to the nearest integer. We compare the results of our methods against Open image in new window and Open image in new window . The choice of Open image in new window is motivated by its ability to ignore missing elements in the input data and its generally good performance on the previous tests. There is only one caveat: Open image in new window sometimes produces very high spikes for some elements in the matrix. They do not cause too much problem with prediction, but they seriously deteriorate the results of Open image in new window with respect to various distance measures. For this reason we always ignore such elements. While this comparison method is obviously not completely fair towards other methods, it can serve as a rough upper bound for what performance is possible with NMF-based algorithms. Comparing against other methods is obviously not fair as they are not designed to deal with missing values, but we will still present the results of Open image in new window for completeness.

On Open image in new window we perform standard cross-validation tests, where a random selection of elements is chosen as a holdout set and removed from the data. The data has 943 users, each having rated from 19 to 648 movies. A holdout set is chosen by sampling uniformly at random 5 ratings from each user. We run the algorithms, while treating the elements from the holdout set as missing values, and then compare the reconstructed matrices to the original data. This procedure is repeated 10 times.

To get a more complete view on how good the predictions are, we report various measures of quality: Frobenius error, root mean square error (RMSE), reciprocal rank, Spearman’s \(\rho \), mean absolute error (MAE), Jensen–Shannon divergence (JS), optimistic reciprocal rank, Kendall’s \(\tau \), and prediction accuracy. The prediction accuracy allows us to see if the methods are capable of recovering the missing user ratings. The remaining tests can be divided into two categories. The first one, which comprises Frobenius error, root mean square error, mean absolute error, and Jensen–Shannon divergence, aims to quantify the distance between the original data and the reconstructed matrix. The second group of tests finds the correlation between rankings of movies for each user. It includes Spearman’s \(\rho \), Kendall’s \(\tau \), reciprocal rank, and optimistic reciprocal rank.

*U*the set of all users. In the following, for each user \(u\in U\) we only consider the set of movies

*M*(

*u*) that this user has rated that belong to the holdout set. The ratings by user

*u*induce a natural ranking on

*M*(

*u*). On the other hand, the algorithms produce approximations \(r'(u, m)\) to the true ratings

*r*(

*u*,

*m*), which also induce a corresponding ranking of the movies. The reciprocal rank is a convenient way of comparing the rankings obtained by the algorithms to the original one. For any user \(u \in U\), denote by

*H*(

*u*) a set of movies that this user ranked the highest (that is \(H(u) = \lbrace m\in M(u) \, : \, r(u, m) = \max _{m'\in M(u)} r(u, m') \rbrace \)). The reciprocal rank for user

*u*is now defined aswhere

*R*(

*u*,

*m*) is the rank of the movie

*m*within

*M*(

*u*) according to the rating approximations given by the algorithm in question. Now the mean reciprocal rank is defined as the average of the reciprocal ranks for each individual user Open image in new window . When computing the ranks

*R*(

*u*,

*m*), all tied elements receive the same rank, which is computed by averaging. That means that if, say, movies \(m_1\) and \(m_2\) have tied ranks of 2 and 3, then they both receive the rank of 2.5. An alternative way is to always assign the smallest possible rank. In the above example both \(m_1\) and \(m_2\) will receive rank 2. When ranks

*R*(

*u*,

*m*) are computed like this, the equation (23) defines the optimistic reciprocal rank.

Comparison between the predictive power of different methods on the Open image in new window data

Frobenius | RMSE | |||
---|---|---|---|---|

value \((\downarrow )\) |
| value \((\downarrow )\) |
| |

\(\mathbf {0.2876}\pm 0.003\) | \(\mathbf {1.0802} \pm 0.011\) | |||

\(0.6993\pm 0.024\) | 0.0001 | \(2.6267 \pm 0.085\) | 0.0001 | |

\(0.2989 \pm 0.003\) | 0.0001 | \(1.1227 \pm 0.012\) | 0.0001 | |

\(0.7336 \pm 0.002\) | 0.0001 | \(2.7558 \pm 0.014\) | 0.0001 |

Recip. rank | Spearman’s \(\rho \) | |||
---|---|---|---|---|

value \((\uparrow )\) |
| value \((\uparrow )\) |
| |

\(\mathbf {0.7451} \pm 0.010\) | \(0.3071 \pm 0.015\) | 0.5749 | ||

\(0.5601\pm 0.017\) | 0.0001 | \(0.2354 \pm 0.017\) | 0.0001 | |

\(0.7395 \pm 0.004\) | 0.0521 | \(\mathbf {0.3084} \pm 0.012\) | ||

\(0.7217 \pm 0.008\) | 0.0004 | \(0.2445 \pm 0.013\) | 0.0001 |

MAE | JS | |||
---|---|---|---|---|

value \((\downarrow )\) |
| value \((\downarrow )\) |
| |

\(\mathbf {0.8203} \pm 0.008\) | \(\mathbf {0.0201} \pm 0.000\) | |||

\(2.0518 \pm 0.106\) | 0.0001 | \(0.2826 \pm 0.026\) | 0.0001 | |

\(0.8555 \pm 0.008\) | 0.0001 | \(0.0209 \pm 0.000\) | 0.0057 | |

\(2.4756 \pm 0.014\) | 0.0001 | \(0.1153 \pm 0.001\) | 0.0001 |

Recip. rank opt. | Kendall’s \(\tau \) | |||
---|---|---|---|---|

value \((\uparrow )\) |
| value \((\uparrow )\) |
| |

\(0.7451\pm 0.010\) | 0.0001 | \(0.2659 \pm 0.013\) | 0.4251 | |

\(\mathbf {0.8547}\pm 0.010\) | \(0.2127 \pm 0.016\) | 0.0001 | ||

\(0.7395 \pm 0.004\) | 0.0001 | \(\mathbf {0.2679} \pm 0.010\) | ||

\(0.7217 \pm 0.008\) | 0.0001 | \(0.2111 \pm 0.012\) | 0.0001 |

Accuracy | ||||
---|---|---|---|---|

value \((\uparrow )\) |
| |||

\(\mathbf {0.3968}\pm 0.008\) | ||||

\(0.2053 \pm 0.019\) | 0.0001 | |||

\(0.3828\pm 0.006\) | 0.0011 | |||

\(0.0588 \pm 0.003\) | 0.0001 |

For each test, Table 5 shows the mean and the standard deviation of the results of each algorithm. In addition we report the *p*-value based on the Wilcoxon signed-rank test. It shows if an advantage of one method over another is statistically significant. We say that a method *A* is significantly better than method *B* if the *p*-value is \(<0.05\). It is unreasonable to report the *p*-value for every method pair—instead we only show *p*-values involving the best method. For each method, the value given next to it is the *p*-value for this method and the best method.

*p*values are quite high. In summary, our experiments show that Open image in new window is significantly better in tests that measure the direct distance between the original and the reconstructed matrices, as well as the prediction accuracy, whereas for the ranking experiments it is difficult to give any of the algorithms an edge.

Comparison between the predictive power of different methods on the Open image in new window data

Frobenius | RMSE | |||
---|---|---|---|---|

value \((\downarrow )\) |
| value \((\downarrow )\) |
| |

\(0.4824 \pm 0.016\) | 0.0040 | \(2.3124 \pm 0.085\) | 0.0040 | |

\(0.7827 \pm 0.023\) | 0.0040 | \(3.7521 \pm 0.131\) | 0.0040 | |

\(\mathbf {0.4374} \pm 0.006\) | \(\mathbf {2.0925} \pm 0.041\) | |||

\(0.6005 \pm 0.003\) | 0.0040 | \(2.8784 \pm 0.032\) | 0.0040 |

MAE | JS | |||
---|---|---|---|---|

value \((\downarrow )\) |
| value \((\downarrow )\) |
| |

\(1.5852 \pm 0.050\) | 0.0040 | \(0.0675\pm 0.005\) | 0.0040 | |

\(2.3871\pm 0.099\) | 0.0040 | \(0.2929 \pm 0.028\) | 0.0040 | |

\(\mathbf {1.2138} \pm 0.011\) | \(\mathbf {0.0367}\pm 0.000\) | |||

\(1.8413 \pm 0.013\) | 0.0040 | \(0.0786 \pm 0.001\) | 0.0040 |

Accuracy | ||||
---|---|---|---|---|

value \((\uparrow )\) |
| |||

\(0.2315\pm 0.010\) | 0.0040 | |||

\(0.1918\pm 0.019\) | 0.0040 | |||

\(\mathbf {0.3996}\pm 0.004\) | ||||

\(0.2061\pm 0.002\) | 0.0040 |

*p*-value is 0.004 in for all metrics, which is the result of a particular number of folds (5) that we used. The fact that we have this number everywhere in the table simply indicates that Open image in new window was better than any other method on every fold with respect to all measures. With Open image in new window the roles reverse, and this time Open image in new window is clearly the best method, winning according to all metrics and on all folds, just as Open image in new window did on Open image in new window .

Comparison between the predictive power of different methods on the Open image in new window data

Frobenius | RMSE | |||
---|---|---|---|---|

value \((\downarrow )\) |
| value \((\downarrow )\) |
| |

\(\mathbf {0.3690} \pm 0.018\) | \(\mathbf {1.1462} \pm 0.065\) | |||

\(0.5741 \pm 0.054\) | 0.0040 | \(1.7822 \pm 0.161\) | 0.0040 | |

\(0.4113 \pm 0.014\) | 0.0040 | \(1.2748 \pm 0.038\) | 0.0040 | |

\(0.5003 \pm 0.002\) | 0.0040 | \(1.5534 \pm 0.019\) | 0.0040 |

MAE | JS | |||
---|---|---|---|---|

value \((\downarrow )\) |
| value \((\downarrow )\) |
| |

\(\mathbf {0.3286} \pm 0.014\) | \(\mathbf {0.0228} \pm 0.001\) | |||

\(0.6712\pm 0.094\) | 0.0040 | \(0.1208 \pm 0.037\) | 0.0040 | |

\(0.3932 \pm 0.006\) | 0.0040 | \(0.0268\pm 0.000\) | 0.0040 | |

\(0.9391 \pm 0.006\) | 0.0040 | \(0.0919 \pm 0.000\) | 0.0040 |

Accuracy | ||||
---|---|---|---|---|

value \((\uparrow )\) |
| |||

\(\mathbf {0.8841}\pm 0.002\) | ||||

\(0.7111\pm 0.050\) | 0.0040 | |||

\(0.8562\pm 0.001\) | 0.0040 | |||

\(0.2837\pm 0.002\) | 0.0040 |

#### 5.3.2 Interpretability of the results

The crux of using max-times factorizations instead of standard (nonnegative) ones is that the factors (are supposed to) exhibit the “winner-takes-it-all” structure instead of the “parts-of-whole” structure. To demonstrate this, we analysed results in four different datasets: Open image in new window , Open image in new window , Open image in new window , and Open image in new window . The Open image in new window dataset is explained below.

We plotted the left factor matrices for the Open image in new window data for Open image in new window and Open image in new window in Fig. 14. At first, it might look like Open image in new window provides more interpretable results, as most factors are easily identifiable as faces. This, however, is not a very interesting result: we already knew that the data has faces, and many factors in the Open image in new window ’s result are simply some kind of “prototypical” faces. The results of Open image in new window are harder to identify on the first sight. Upon closer inspection, though, one can see that they identify areas that are lighter in the different images, that is, have higher grayscale values. These factors tell us the variances in the lighting in the different photos, and can reveal information we did not know a priori. In addition almost every one of Open image in new window ’s factors contains one or two main feature of the face (such as nose, left eye, right cheek, etc.). In other words, while NMF’s patterns are for the most part close to fully formed faces, Open image in new window finds independent fragments that indicate the direction of the lighting and (or) contain some of the main features of a face.

Top three attributes for the first two factors of Open image in new window

Factor 1 | Factor 2 |
---|---|

I am more comfortable with my hobbies | I have played a lot of video games |

than I am with other people | |

I gravitate towards introspection | I collect books |

I sometimes prefer fictional people to real ones | I care about super heroes |

In order to interpret Open image in new window we first observe that each column represents a single personality attribute. Denote by Open image in new window the obtained approximation of the original matrix. For each rank-1 factor Open image in new window and each column Open image in new window we define the score \(\sigma (i)\) as the number of elements in Open image in new window that are determined by Open image in new window . By sorting attributes in descending order of \(\sigma (i)\) we obtain relative rankings of the attributes for a given factor. The results are shown in Table 8. The first factor clearly shows introverted tendencies, while the second one can be summarized as having interests in fiction and games.

Figure 15 shows all of the factors for the Open image in new window data, as obtained by Open image in new window and Open image in new window (the best NMF-method in Table 2). Figure 15a, b shows left-hand sides of factors found by Open image in new window and Open image in new window , respectively, plotted on the map. Darker colours indicate higher values, that can be interpreted as “more important”. The right-hand side factors are presented in Fig. 15c, d, respectively. Here, each row corresponds to a factor, and each column to a single observation column from the original data (that is columns 1–12 represent average low temperatures for each month, columns 13–24 average high temperatures, columns 25–36 daily means, and columns 37–48 average monthly precipitation). Again, higher values can be seen as having more importance. Recall that a pattern is formed by taking an outer product of a single left-hand factor and the corresponding right-hand factor. It is easy to see that largest (and thus the most important) values in a pattern are those that are products of high values in both right-hand side and left-hand side factors.

The Open image in new window factors have less high values (dark colours—all factors are normalized to the unit interval). For Open image in new window , there are more large values in each factor. This highlights the difference between the subtropical and the normal algebra: in normal algebra, if you sum two large values, the result is even larger, whereas in subtropical algebra, the result is no larger than the largest of the summands. In decompositions, this means that Open image in new window cannot have overlapping high values in its factors; instead it has to split its factors to mostly non-overlapping parts. Open image in new window , on the other hand, can have overlap, and hence its factors can share some phenomena. For instance, the seventh factor of Open image in new window clearly indicates areas of high precipitation (cf. Fig. 1). The same phenomenon is split into many factors by Open image in new window (at least third, sixth, and seventh factor), mostly explaining areas with higher precipitation at different parts of the year. While many elements in the right-hand side factors of Open image in new window are nonzero, that does not mean that all of them are of equal importance. Because some of them are dominated by larger features, they do not influence the final outcome. Generally, since larger values are more likely to make a contribution than smaller ones, they should be considered more important when interpreting the data.

^{15}was obtained from the original binary location-species matrix (see Mitchell-Jones et al. 1999) by multiplying it with its transpose and then normalizing by dividing each column by its maximal element. The obtained matrix has 2670 rows and columns and density 91%. Due to its special nature, we use it only in this experiment to provide intuition about the subtropical factorizations.

The factors obtained by Open image in new window with the Open image in new window data are depicted in Fig. 16, where we can see that many of these factors cover the central parts of the European Plain, extending a bit south to cover most of Germany. There are, naturally, many mammal species that inhabit the whole European Plain, and the east–west change is gradual. This gradual change is easier to model in subtropical algebra, as we do not have to worry about the sums of the factors getting too large. Factors 1–6 model various aspects of the east–west change, emphasizing either the south–west, central, or eastern parts of the plain. Similarly, the ninth factor explains mammal species found in the UK and southern Scandinavia, while the tenth factor covers species found in Scotland, Scandinavia, and Baltic countries, indicating that these areas have roughly the same biome. If we compare these results to those of Open image in new window (Fig. 17), then it becomes evident that the latter tries to find relatively disjoint factors and avoids the factor overlap whenever possible. This is because in NMF any feature that is nonzero at a given data point is always “active” in a sense that it contributes to the final value. That being said, Open image in new window does find some interesting patterns, such as rather distinct factors representing France and Scandinavian peninsula.

## 6 Related work

Here we present earlier research that is related to the subtropical matrix factorization. We start by discussing classic methods, such as SVD and NMF, that have long been used for various data analysis tasks, and then continue with approaches that use idempotent structures. Since the tropical algebra is very closely related to the subtropical algebra, and since there has been a lot of research on it, we dedicate the last subsection to discuss it in more detail.

### 6.1 Matrix factorization in data analysis

Matrix factorization methods play a crucial role in data analysis as they help to find low-dimensional representations of the data and uncover the underlying latent structure. A classic example of a real-valued matrix factorization is the singular value decomposition (SVD) (see e.g. Golub and Van Loan 2012), which is very well known and finds extensive applications in various disciplines, such as signal processing and natural language processing. The SVD of a real *n*-by-*m* matrix Open image in new window is a factorization of the form Open image in new window where Open image in new window and Open image in new window are orthogonal matrices, and Open image in new window is a rectangular diagonal matrix with nonnegative entries. An important property of SVD is that it provides the best low-rank approximation of a given matrix with respect to the Frobenius norm (Golub and Van Loan 2012), giving rise to the so called truncated SVD. This property is frequently used to separate important parts of data from the noise. For example, it was used by Jha and Yadava (2011) to remove the noise from sensor data in electronic nose systems. Another prominent usage of the truncated SVD is in dimensionality reduction (see for example Sarwar et al. 2000; Deerwester et al. 1990).

Despite SVD being so ubiquitous, there are some restrictions to its usage in data mining due to possible presence of negative elements in the factors. In many applications negative values are hard to interpret, and thus other methods have to be used. Nonnegative matrix factorization (NMF) is a way to tackle this problem. For a given nonnegative real matrix Open image in new window , the NMF problem is to find a decomposition of Open image in new window into two matrices Open image in new window such that Open image in new window and Open image in new window are also nonnegative. Its applications are extensive and include text mining (Pauca et al. 2004), document clustering (Xu et al. 2003), pattern discovery (Brunet et al. 2004), and many other. This area drew considerable attention after a publication by Lee and Seung (1999), where they provided an efficient algorithm for solving the NMF problem. It is worth mentioning that even though the paper by Lee and Seung is perhaps the most famous in NMF literature, it was not the first one to consider this problem. Earlier works include Paatero and Tapper (1994) (see also Paatero 1997), Paatero (1999), and Cohen and Rothblum (1993). Berry et al. (2007) provide an overview of NMF algorithms and their applications. There exist various flavours of NMF that impose different constraints on the factors; for example Hoyer (2004) used sparsity constraints. Though both NMF and SVD perform approximations of a fixed rank, there are also other ways to enforce compact representation of data. For example, in maximum-margin matrix factorization constraints are imposed on the norms of factors. This approach was exploited by Srebro et al. (2004), who showed it to be a good method for predicting unobserved values in a matrix. The authors also indicate that posing constraints on the factor norms, rather than on the rank, yields a convex optimization problem, which is easier to solve.

### 6.2 Idempotent semirings

The concept of the subtropical algebra is relatively new, and as far as we know, its applications in data mining are not yet well studied. Indeed, its only usage for data analysis that we are aware of was by Weston et al. (2013), where it was used as a part of a model for collaborative filtering. The authors modeled users as a set of vectors, where each vector represents a single aspect about the user (e.g. a particular area of interest). The ratings are then reconstructed by selecting the highest scoring prediction using the \(\max \) operator. Since their model uses \(\max \) as well as the standard plus operation, it stands on the border between the standard and the subtropical worlds.

Boolean algebra, despite being limited to the binary set \(\{0,1\}\), is related to the subtropical algebra by virtue of having the same operations, and is thus a restriction of the latter to \(\{0,1\}\). By the same token, when both factor matrices are binary, their subtropical product coincides with the Boolean product, and hence the Boolean matrix factorization can be seen as a degenerate case of the subtropical matrix factorization problem. The dioid properties of the Boolean algebra can be checked trivially. The motivation for the Boolean matrix factorization comes from the fact that in many applications data is naturally represented as a binary matrix (e.g. transaction databases), which makes it reasonable to seek decompositions that preserve the binary character of the data. The conceptual and algorithmic analysis of the problem was done by Miettinen (2009), with the focus mainly on the data mining perspective of the problem. For a linear algebra perspective see Kim (1982), where the emphasis is put on the existence of exact decompositions. A number of algorithms have been proposed for solving the BMF problem (Miettinen et al. 2008; Lu et al. 2008; Lucchese et al. 2014; Karaev et al. 2015).

### 6.3 Tropical algebra

Another close cousin of the max-times algebra is the max-plus, or so called tropical algebra, which uses plus in place of multiplication. It is also a dioid due to the idempotent nature of the \(\max \) operation. As was mentioned earlier, the two algebras are isomorphic, and hence many of the properties are identical (see Sects. 2 and 3 for more details).

Despite the theory of the tropical algebra being relatively young, it has been thoroughly studied in recent years. The reason for this is that it finds extensive applications in various areas of mathematics and other disciplines. An example of such a field is the discrete event systems (DES) (Cassandras and Lafortune 2008), where the tropical algebra is ubiquitously used for modeling (see e.g. Baccelli et al. 1992; Cohen et al. 1999). Other mathematical disciplines where the tropical algebra plays a crucial role are optimal control (Gaubert 1997), asymptotic analysis (Dembo and Zeitouni 2010; Maslov 1992; Akian 1999), and decidability (Simon 1978, 1994).

Research on tropical matrix factorization is of interest to us because of the above mentioned isomorphism between the two algebras. However, as was explained in Sect. 3, the approximate matrix factorizations are not directly transferable as the errors can differ dramatically. It should be mentioned that in the general case the problem of the tropical matrix factorization is NP-hard (see e.g. Shitov 2014). De Schutter and De Moor (2002) demonstrated that if the max-plus algebra is extended in such a way that there is an additive inverse for each element, then it is possible to solve many of the standard matrix decomposition problems. Among other results the authors obtained max-plus analogues of QR and SVD. They also claimed that the techniques they propose can be readily extended to other types of classic factorizations (e.g. Hessenberg and LU decomposition).

For equations of the form Open image in new window the feasibility can be established for example through the so called *matrix residuation*. There is a general result that for an *n*-by-*m* matrix Open image in new window over a complete idempotent semiring, the existence of the solution can be checked in *O*(*nm*) time (see Gaubert 1997). Although the tropical algebra is not complete, there is an efficient way of finding if the solution exists (Cuninghame-Green 1979; Zimmermann 2011). It was shown by Butkovič (2003) that this type of tropical equations is equivalent to the set cover problem, which is known to be NP-hard. This directly affects the max-times algebra through the above-mentioned isomorphism and makes the problem of precisely solving max-times linear systems of the form Open image in new window infeasible for high dimensions.

Homogeneous equations Open image in new window can be solved using the *elimination* method, which is based on the fact that the set of solutions of a homogeneous system is a finitely generated semimodule (Butkovič and Hegedüs 1984) (independently rediscovered by Gaubert 1992). If only a single solution is required, then according to Gaubert (1997), a method by Walkup and Borriello (1998) is usually the fastest in practice.

Another important direction of research is the eigenvalue problem Open image in new window . Tropical analogues of the Perron–Frobenius theorem (see e.g. Vorobyev 1967; Maslov 1992), and Collatz–Wielandt formula (Bapat et al. 1995; Gaubert 1992) were developed. For a general overview of the results in the \((\max , +)\) spectral theory, see for example Gaubert (1997).

Tropical algebra and tropical geometry were used by Gärtner and Jaggi (2008) to construct a tropical analogue of an SVM. Unlike in the classical case, tropical SVMs are localized, in the sense that the kernel at any given point is not influenced by all the support vectors. Their work also utilizes the fact that tropical hyperplanes are somewhat more complex than their counterparts in the classical geometry, which makes it possible to do multiple category classification with a single hyperplane.

## 7 Conclusions

Subtropical low-rank factorizations are a novel approach for finding latent structure from nonnegative data. The factorizations can be interpreted using the winner-takes-it-all interpretation: the value of the element in the final reconstruction depends only on the largest of values in the corresponding elements of the rank-1 components (compare that to NMF, where the value in the reconstruction is the *sum* of the corresponding elements). That the factorizations are different does not necessarily mean that they are better in terms of reconstruction error, although they can yield lower reconstruction error than even SVD. It does mean, however, that they find different structure from the data. This is an important advantage, as it allows the data analyst to use both the classical factorizations and the subtropical factorizations to get a broader understanding of the kinds of patterns that are present in the data.

Working in the subtropical algebra is harder than in the normal algebra, though. The various definitions for the rank, for example, do not agree, and computing many of them—including the subtropical Schein rank, which is arguably the most useful one for data analysis—is computationally hard. That said, our proposed algorithms, Open image in new window and Open image in new window , can find the subtropical structure when it is present in the data. Not every data has subtropical structure, though, and due to the complexity of finding the optimal subtropical factorization we cannot distinguish between the cases where our algorithms fail to find the latent subtropical structure, and where it does not exist. Based on our experiments with synthetic data, our hypothesis is that the failure of finding a good factorization more probably indicates the lack of the subtropical structure rather than the algorithms’ failure. Naturally, more experiments using data with known subtropical structure should improve our confidence of the correctness of the hypothesis.

The presented algorithms are heuristics. Developing algorithms that achieve better reconstruction error is naturally an important direction of future work. In our Open image in new window framework, this hinges on the task of finding the rank-1 components. In addition, the scalability of the algorithms could be improved. A potential direction could be to take into account the sparsity of the factor matrices in dominated decompositions. This could allow one to concentrate only on the non-zero entries in the factor matrices.

The connection between Boolean and (sub-)tropical factorizations raises potential directions for future work. The continuous framework could allow for easier optimization in the Boolean algebra. Also, the connection allows us to model combinatorial structures (e.g. cliques in a graph) using subtropical matrices. This could allow for novel approaches on finding such structures using continuous subtropical factorizations.

## Footnotes

- 1.
The raw data is available at http://www.worldclim.org/, accessed 18 July 2017.

- 2.
- 3.
- 4.
https://github.com/aludnam/MATLAB/tree/master/nmfpack, accessed 18 July 2017.

- 5.
Submitted to the matrix repository by Csaba Meszaros.

- 6.
http://www.cise.ufl.edu/research/sparse/matrices/, accessed 18 July 2017.

- 7.
Submitted by Nicolas Thiery.

- 8.
Tha dataset can be obtained on the online personality website http://personality-testing.info/_rawdata/NPAS-data.zip, accessed 18 July 2017.

- 9.
http://www.cad.zju.edu.cn/home/dengcai/Data/FaceData.html, accessed 18 July 2017.

- 10.
http://qwone.com/~jason/20Newsgroups/, accessed 18 July 2017.

- 11.
The authors are grateful to Ata Kabán for pre-processing the data, see Miettinen (2009).

- 12.
Available at https://data.gov.uk/dataset/land-registry-house-price-index-background-tables/, accessed 18 July 2017.

- 13.
Available at http://grouplens.org/datasets/movielens/100k/, accessed 18 July 2017.

- 14.
The values are different than those presented by Karaev and Miettinen (2016b) because we used Frobenius error instead of \(L_1\) and counted all elements towards the error, not just nonnegative ones.

- 15.
Available for research purposes from the Societas Europaea Mammalogica at http://www.european-mammals.org.

## Notes

### Acknowledgements

Open access funding provided by University of Eastern Finland (UEF) including Kuopio University Hospital.

## References

- Akian M (1999) Densities of idempotent measures and large deviations. Trans Am Math Soc 351(11):4515–4543. https://doi.org/10.1090/S0002-9947-99-02153-4 MathSciNetCrossRefzbMATHGoogle Scholar
- Akian M, Bapat R, Gaubert S (2007) Max-plus algebra. In: Hogben L (ed) Handbook of linear algebra. Chapman & Hall/CRC, Boca RatonzbMATHGoogle Scholar
- Akian M, Gaubert S, Guterman A (2009) Linear independence over tropical semirings and beyond. Contemp Math 495:1–38MathSciNetCrossRefGoogle Scholar
- Baccelli F, Cohen G, Olsder GJ, Quadrat JP (1992) Synchronization and linearity: an algebra for discrete event systems. Wiley, Hoboken. https://doi.org/10.2307/2583959 CrossRefzbMATHGoogle Scholar
- Bapat R, Stanford DP, Van den Driessche P (1995) Pattern properties and spectral inequalities in max algebra. SIAM J Matrix Anal Appl 16(3):964–976. https://doi.org/10.1137/S0895479893251782 MathSciNetCrossRefzbMATHGoogle Scholar
- Berry MW, Browne M, Langville AN, Pauca VP, Plemmons RJ (2007) Algorithms and applications for approximate nonnegative matrix factorization. Comput Stat Data Anal 52(1):155–173. https://doi.org/10.1016/j.csda.2006.11.006 MathSciNetCrossRefzbMATHGoogle Scholar
- Blondel VD, Gaubert S, Tsitsiklis JN (2000) Approximating the spectral radius of sets of matrices in the max-algebra is NP-hard. IEEE Trans Autom Control 45(9):1762–1765. https://doi.org/10.1109/9.880644 MathSciNetCrossRefzbMATHGoogle Scholar
- Brunet JP, Tamayo P, Golub TR, Mesirov JP (2004) Metagenes and molecular pattern discovery using matrix factorization. Proc Natl Acad Sci USA 101(12):4164–4169. https://doi.org/10.1073/pnas.0308531101 CrossRefGoogle Scholar
- Butkovič P (2003) Max-algebra: the linear algebra of combinatorics? Linear Algebra Appl 367:313–335. https://doi.org/10.1016/S0024-3795(02)00655-9 MathSciNetCrossRefzbMATHGoogle Scholar
- Butkovič P (2010) Max-linear systems: theory and algorithms. Springer, New York. https://doi.org/10.1007/978-1-84996-299-5 CrossRefzbMATHGoogle Scholar
- Butkovič P, Hegedüs G (1984) An elimination method for finding all solutions of the system of linear equations over an extremal algebra. Ekon-Mat Obzor 20(2):203–215MathSciNetzbMATHGoogle Scholar
- Butkovič P, Hevery F (1985) A condition for the strong regularity of matrices in the minimax algebra. Discrete Appl Math 11(3):209–222. https://doi.org/10.1016/0166-218X(85)90073-3 MathSciNetCrossRefzbMATHGoogle Scholar
- Cassandras CG, Lafortune S (2008) Introduction to discrete event systems, 2nd edn. Springer, Berlin. https://doi.org/10.1007/978-0-387-68612-7 CrossRefzbMATHGoogle Scholar
- Cichocki A, Zdunek R, Phan AH, Amari S (2009) Nonnegative matrix and tensor factorizations: applications to exploratory multi-way data analysis and blind source separation. Wiley, Chichester. https://doi.org/10.1002/9780470747278 CrossRefGoogle Scholar
- Cohen G, Gaubert S, Quadrat JP (1999) Max-plus algebra and system theory: where we are and where to go now. Annu Rev Control 23:207–219. https://doi.org/10.1016/S1367-5788(99)90091-3 CrossRefGoogle Scholar
- Cohen JE, Rothblum UG (1993) Nonnegative ranks, decompositions, and factorizations of nonnegative matrices. Linear Algebra Appl 190:149–168. https://doi.org/10.1016/0024-3795(93)90224-C MathSciNetCrossRefzbMATHGoogle Scholar
- Cuninghame-Green RA (1979) Minimax algebra. Springer, Berlin. https://doi.org/10.1007/978-3-642-48708-8 CrossRefzbMATHGoogle Scholar
- Davis TA, Hu Y (2011) The University of Florida sparse matrix collection. ACM Trans Math Soft 38(1):1–25. https://doi.org/10.1145/2049662.2049663 MathSciNetCrossRefzbMATHGoogle Scholar
- De Schutter B, De Moor B (2002) The QR decomposition and the singular value decomposition in the symmetrized max-plus algebra revisited. SIAM Rev 44(3):417–454. https://doi.org/10.1137/S00361445024039 MathSciNetCrossRefzbMATHGoogle Scholar
- Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41:391–407. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6 CrossRefGoogle Scholar
- Dembo A, Zeitouni O (2010) Large deviations techniques and applications, 2nd edn. Springer, Berlin. https://doi.org/10.1007/978-3-642-03311-7 CrossRefzbMATHGoogle Scholar
- Gärtner B, Jaggi M (2008) Tropical support vector machines. Technical report, ACS-TR-362502-01Google Scholar
- Gaubert S (1992) Théorie des systèmes linéaires dans les dioïdes. PhD thesis, Ecole nationale supérieure des mines de ParisGoogle Scholar
- Gaubert S (1997) Methods and applications of (max,+) linear algebra. In: 14th Annual symposium on theoretical aspects of computer science (STACS). Springer, pp 261–282. https://doi.org/10.1007/BFb0023465
- Georghiades AS, Belhumeur PN, Kriegman DJ (2000) From few to many: generative models for recognition under variable pose and illumination. In: 4th IEEE international conference on automatic face and gesture recognition (FG), pp 277–284. https://doi.org/10.1109/AFGR.2000.840647
- Gillis N, Glineur F (2010) Using underapproximations for sparse nonnegative matrix factorization. Pattern Recognit 43(4):1676–1687. https://doi.org/10.1016/j.patcog.2009.11.013 CrossRefzbMATHGoogle Scholar
- Golub GH, Van Loan CF (2012) Matrix computations, 4th edn. Johns Hopkins University Press, BaltimorezbMATHGoogle Scholar
- Gondran M, Minoux M (1984a) Graphs and algorithms. Wiley, New YorkzbMATHGoogle Scholar
- Gondran M, Minoux M (1984b) Linear algebra in dioids: a survey of recent results. North-Holland Math Stud 95:147–163. https://doi.org/10.1016/S0304-0208(08)72960-8 MathSciNetCrossRefzbMATHGoogle Scholar
- Guillon P, Izhakian Z, Mairesse J, Merlet G (2015) The ultimate rank of tropical matrices. J Algebra 437:222–248. https://doi.org/10.1016/j.jalgebra.2015.02.026 MathSciNetCrossRefzbMATHGoogle Scholar
- Hoyer PO (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5:1457–1469MathSciNetzbMATHGoogle Scholar
- Jha SK, Yadava R (2011) Denoising by singular value decomposition and its application to electronic nose data processing. IEEE Sens J 11(1):35–44. https://doi.org/10.1109/JSEN.2010.2049351 CrossRefGoogle Scholar
- Karaev S, Miettinen P (2016a) Cancer: another algorithm for subtropical matrix factorization. In: European conference on machine learning and principles and practice of knowledge discovery in databases (ECML PKDD), pp 576–592. https://doi.org/10.1007/978-3-319-46227-1_36
- Karaev S, Miettinen P (2016b) Capricorn: an algorithm for subtropical matrix factorization. In: 16th SIAM international conference on data mining (SDM), pp 702–710. https://doi.org/10.1137/1.9781611974348.79
- Karaev S, Miettinen P, Vreeken J (2015) Getting to know the unknown unknowns: destructive-noise resistant Boolean matrix factorization. In: 15th SIAM international conference on data mining (SDM), pp 325–333. https://doi.org/10.1137/1.9781611974010.37
- Kim KH (1982) Boolean matrix theory and applications. Marcel Dekker, New YorkzbMATHGoogle Scholar
- Kim KH, Roush FW (2005) Factorization of polynomials in one variable over the tropical semiring. Technical report. arXiv:math/0501167
- Kolda T, O’Leary D (2000) Algorithm 805: computation and uses of the semidiscrete matrix decomposition. ACM Trans Math Softw 26(3):415–435. https://doi.org/10.1145/358407.358424 CrossRefGoogle Scholar
- Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791. https://doi.org/10.1038/44565 CrossRefzbMATHGoogle Scholar
- Li Y, Ngom A (2013) The non-negative matrix factorization toolbox for biological data mining. Source Code Biol Med 8(1):1–15. https://doi.org/10.1186/1751-0473-8-10 CrossRefGoogle Scholar
- Lu H, Vaidya J, Atluri V (2008) Optimal boolean matrix decomposition: Application to role engineering. In: 24th IEEE international conference on data engineering (ICDE), pp 297–306. https://doi.org/10.1109/ICDE.2008.4497438
- Lucchese C, Orlando S, Perego R (2014) A unifying framework for mining approximate top-\(k\) binary patterns. IEEE Trans Knowl Data Eng 26(12):2900–2913. https://doi.org/10.1109/TKDE.2013.181 CrossRefGoogle Scholar
- Maslov V (1992) Idempotent analysis. American Mathematical Society, ProvidenceCrossRefGoogle Scholar
- Miettinen P (2009) Matrix decomposition methods for data mining: computational complexity and algorithms. PhD thesis, University of HelsinkiGoogle Scholar
- Miettinen P, Mielikäinen T, Gionis A, Das G, Mannila H (2008) The discrete basis problem. IEEE Trans Knowl Data Eng 20(10):1348–1362CrossRefGoogle Scholar
- Mitchell-Jones A, Amori G, Bogdanowicz W, Krystufek B, Reijnders PH, Spitzenberger F, Stubbe M, Thissen J, Vohralik V, Zima J (1999) The atlas of European mammals. Academic Press, LondonGoogle Scholar
- Paatero P (1997) Least squares formulation of robust non-negative factor analysis. Chemometr Intell Lab 37(1):23–35. https://doi.org/10.1016/S0169-7439(96)00044-5 CrossRefGoogle Scholar
- Paatero P (1999) The multilinear engine-table-driven, least squares program for solving multilinear problems, including the \(n\)-way parallel factor analysis model. J Comp Graph Stat 8(4):854–888. https://doi.org/10.1080/10618600.1999.10474853 MathSciNetCrossRefGoogle Scholar
- Paatero P, Tapper U (1994) Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5(2):111–126. https://doi.org/10.1080/10618600.1999.10474853 CrossRefGoogle Scholar
- Pauca VP, Shahnaz F, Berry MW, Plemmons RJ (2004) Text mining using nonnegative matrix factorizations. In: 4th SIAM international conference on data mining (SDM), pp 22–24. https://doi.org/10.1137/1.9781611972740.45
- Salomaa A, Soittola M (2012) Automata-theoretic aspects of formal power series. Springer, New YorkzbMATHGoogle Scholar
- Sarwar B, Karypis G, Konstan J, Riedl J (2000) Application of dimensionality reduction in recommender system—a case study. Technical report, GroupLens Research GroupGoogle Scholar
- Shitov Y (2014) The complexity of tropical matrix factorization. Adv Math 254:138–156. https://doi.org/10.1016/j.aim.2013.12.013 MathSciNetCrossRefzbMATHGoogle Scholar
- Simon I (1978) Limited subsets of a free monoid. In: 19th IEEE annual symposium on foundations of computer science (FOCS), pp 143–150. https://doi.org/10.1109/SFCS.1978.21
- Simon I (1994) On semigroups of matrices over the tropical semiring. Inform Theor Appl 28(3–4):277–294. https://doi.org/10.1051/ita/1994283-402771 MathSciNetCrossRefzbMATHGoogle Scholar
- Skillicorn D (2007) Understanding complex datasets: data mining with matrix decompositions. Data Mining and Knowledge Discovery. Chapman & Hall/CRC, Boca Raton. https://doi.org/10.1007/s00362-008-0147-y CrossRefzbMATHGoogle Scholar
- Srebro N, Rennie J, Jaakkola TS (2004) Maximum-margin matrix factorization. In: 17th Advances in neural information processing systems (NIPS), pp 1329–1336Google Scholar
- Vavasis SA (2009) On the complexity of nonnegative matrix factorization. SIAM J Optim 20(3):1364–1377. https://doi.org/10.1137/070709967 MathSciNetCrossRefzbMATHGoogle Scholar
- Vorobyev N (1967) Extremal algebra of positive matrices. Elektron Informationsverarbeitung und Kybernetik 3:39–71MathSciNetGoogle Scholar
- Walkup EA, Borriello G (1998) A general linear max-plus solution technique. In: Gunawardena J (ed) Idempotency. Cambridge University Press, Cambridge, pp 406–415. https://doi.org/10.1017/CBO9780511662508.024 CrossRefGoogle Scholar
- Weston J, Weiss RJ, Yee H (2013) Nonlinear latent factorization by embedding multiple user interests. In: 7th ACM conference on recommender systems (RecSys), pp 65–68. https://doi.org/10.1145/2507157.2507209
- Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: 26th Annual international ACM SIGIR conference (SIGIR), pp 267–273. https://doi.org/10.1145/860435.860485
- Zimmermann U (2011) Linear and combinatorial optimization in ordered algebraic structures. Elsevier, AmsterdamGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.