Computational Neuroscience pp 85112  Cite as
Recent Advances of Data Biclustering with Application in Computational Neuroscience
Abstract
Clustering and biclustering are important techniques arising in data mining. Different from clustering, biclustering simultaneously groups the objects and features according their expression levels. In this review, the backgrounds, motivation, data input, objective tasks, and history of data biclustering are carefully studied. The bicluster types and biclustering structures of data matrix are defined mathematically. Most recent algorithms, including OREO, nsNMF, BBC, cMonkey, etc., are reviewed with formal mathematical models. Additionally, a match score between biclusters is defined to compare algorithms. The application of biclustering in computational neuroscience is also reviewed in this chapter.
Keywords
Lyapunov Exponent Bipartite Graph Data Matrix Vagus Nerve Stimulation Nonnegative Matrix Factorization6.1 Introduction
6.1.1 Motivation
With the number of database appearing in computational biology, biomedical engineering, consumers’ behavior survey, and social networks, finding the useful information behind these data and grouping the data are important issues nowadays. Clustering is a method to classify the objects into different groups, so that the objects in each group share some common traits [15, 31, 57]. After this step, the data is reduced to small subgroups and research on each subgroup will be easier and more direct. Clustering has been widely studied in past 20 years, and a general review of clustering is by Jain et al. in [31] while a survey of clustering algorithms is also available by Xu et al. in [57]. The future challenges in biological networks are available in the book edited by Chaovalitwongse et al. in [9].
However, clustering only does the work of objects without considering the features of each object may have. In other words, clustering compares two objects by the features that two share, without depicting the different features of the two. A method simultaneously groups the objects and features is called biclustering such that a specific group of objects has a special kind group of features. More precisely, a biclustering is to find a subset of objects and features satisfying these objects are related to features to some level. Such kind of subsets are called biclusters. Meantime, biclustering does not require objects in the same bicluster to behave similarly over all possible features, but to highly have specific features in this bicluster.
Besides the differences from clustering mentioned above, biclustering also has the abilities to find the hide features and specify them to some subsets of objects. We should also realize that biclustering also has relations but differences from other techniques, such as classification, feature selection, and outlier detection in data mining. Classification is a kind of supervised clustering while most algorithms used in biclustering are unsupervised, and for some supervised biclustering see [4, 40].
The biclustering problem is to find biclusters in data sets, and it may have different names such as coclustering, twomode clustering in some literatures.
6.1.2 Data Input
Usually, we call the objects as samples. Samples have different features and each sample may have or may not have some features. The level of a sample having some specific feature is called expression level. In real world, the samples may have quantitative features or qualitative features. The expression levels of quantitative features can be easily expressed in numerical data, while qualitative features have to use some scale measurement to be transformed into data. For some algorithms of biclustering, qualitative features are allowed.

Expression Matrix. This data matrix has rows corresponding to samples, columns to features, with entry measuring the expression level of a feature in a sample. Each row is called a feature vector of the sample. We can also call this matrix as samplebyfeature matrix.
Sometimes, the matrix is formed from all samples’ feature vectors, and the features’ level in this sample will be observed directly. Generally we just scale and then put these vectors together to form a matrix if all vectors have the same length, which means they have the same set of features. However, the feature vectors may not conform each other. In this case, we should add values (may be 0) to vectors with no corresponding features in order to form samelength vectors. In some applications, there are always large set of samples with limited features.

Similarity Matrix. This data matrix has both rows and columns corresponding to a set of samples, with each entry measuring the similarity between two corresponding samples. It has same number of rows and columns, and it is symmetric. This matrix can be called samplebysample matrix.
Note: this matrix can also be used as dissimilarity matrix with entry denoting the dissimilarity between a pair of samples. There are many similarity measurement functions to compute the (dis)similarity entries, such as Euclidean distance, Mahalanobis distance. So the similarity matrix can be computed from the expression matrix.
Since the developments of biclustering are including some time series models [38, 52], another kind of time series data is also used in biclustering. This data also can be viewed as stored in a matrix with that rows denote samples, while columns from left to right denote observed time points.
For some qualitative features in some cases, the data matrix is a kind of sign matrix. Some biclustering algorithms are still used.
Sometimes, before processing algorithms on the matrix, some steps are used, such as normalization, discretization, value mapping, and aggression, and the details of these data preparation operations are available at [16].
In the following, the data matrix usually refers to the first kind of expression matrix without explanation.
6.1.3 Objective of Task
Obviously, the objective of biclustering is to find biclusters in data. In clustering, the obtained clusters should have the propositions that the similarities among the samples within each cluster are maximized and the similarities between samples from different clusters are minimized.
For biclustering, the samples and features in each bicluster are highly related. But this does not mean the samples in this bicluster do not have other features, they just have the features in this bicluster more obvious and they still share other features. Thus, in each bicluster, the relations between the samples and the features are closer rather than relations between samples (features) from this bicluster and features (samples) from another bicluster.
Some biclustering algorithms allow that one sample or feature can belong to several biclusters (called overlapping) while some others produce exclusive biclusters. In addition, some algorithms have the property that each sample or feature must have its corresponding bicluster, while some others need not to be exhaustive and can allow only find one submatrix or several ones from data matrix to form the biclusters.
As we mentioned above, most of biclustering algorithms are unsupervised classification and it does not need to have any training sets. But supervised biclustering methods are also useful in some cases of biomedicine applications [5, 4, 40].
In this chapter, an optimization prospective of biclustering will be studied, and different objective functions will be used for different algorithms to satisfy part of objectives above. There is no such algorithm that can satisfy all objectives, and additionally, there is no such standard of justifying the algorithms. In distinct applications of biclustering, a specific or several objectives should be met so some algorithms are designed to satisfy these requirements. There are some methods trying to compare different algorithms, and we refer to [37, 44, 47, 61].
6.1.4 History
The first approach to biclustering is “direct clustering of data matrix” by Hartigan [28] in 1972. But the term “biclustering” was famous after Cheng and Church [11] using this technique to do gene expression analysis. After that, many biclustering algorithms are designed in different areas’ applications, such as biological network, microarray data, worddocument coclustering, biomedical engineering, of which the most popular applications are in microarray data and gene expression data.
In 2004, Madeira and Oliveira [37] surveyed the biclustering algorithms for biological data analysis. In this survey, they identified the biclusters into four major classes: biclusters with constant values, with constant values on rows or columns, with coherent values, and with coherent evolutions. The biclustering structures of a data matrix are classified into nine groups according to algorithms: single bicluster, exclusive row and column biclusters, checkerboard structure, exclusive rows biclusters, exclusive columns biclusters, nonoverlapping biclusters with tree structure, nonoverlapping nonexclusive biclusters, overlapping biclusters with hierarchical structure, and arbitrarily positioned overlapping biclusters. In addition, the authors have also divided the algorithms into five classes: Iterative row and column clustering combination, divide and conquer, greedy iterative search, exhaustive bicluster enumeration, and distribution parameter identification. A comparison of these algorithms according to the above three classes is given in this survey.
Another review about biclustering algorithms is by Tanay et al. in [55] in 2004. In this survey, nine mostly used algorithms are reviewed and given with their pseudocodes. Mostly recent review of biclustering is by Busygin et al. in [5], and 16 algorithms are reviewed with their applications in biomedicine and text mining. In this chapter, the authors mentioned that “many of the approaches rely on not mathematically strict arguments and there is a lack of methods to justify the quality of the obtained biclusters.”
In this chapter, we are trying to review and study the biclustering algorithms in mathematical and optimization prospectives. Not all of the algorithms will be covered, but most recent valuable algorithms are covered.
Since the development of biclustering algorithms, many softwares are designed to include several algorithms, including BicAT [2], BicOverlapper [48], BiVisu [10], toolbox by R(biclust) [32] etc. These software or packages allow to do data processing, bicluster analysis, and visualization of results and can be used directly to construct images.
In the toolbox named BicAT [2], it provides different facilities for data preparation, inspection, and postprocessing such as discretization, filtering of biclusters accordingly. Several algorithms of biclustering such as Bimax, CC, XMotifs, OPSM are included, and three methods of viewing data including matrix (heatmap), expression, and analysis are presented. The software BicOverlapper [48] is a tool for overlapping biclusters visualization. It can use three different kinds of data files of original data matrix and resulted biclusters to construct beautiful and colorful images such as heatmaps, parallel coordinates, TRN graph, bubble map, and overlapper. The BiVisu [10] is also a software tool for bicluster detection and visualization. Besides bicluster detection, BiVisu also provides functions for preprocessing, filtering, and bicluster analysis. Another software is a package written by R [32], biclust, which contains a collection of bicluster algorithms, such as Bimax, CC, plaid, spectral, xMotifs, preprocessing methods for two way data, and validation and visualization techniques for bicluster results. For individual biclustering software, there are also some packages available [55, 5].
6.1.5 Outline
In this chapter, we will follow the reviews of [37, 55, 5] and try to include the most recent algorithms and advancements of biclustering. The perspective of this chapter is of mathematical view, including linear algebra, optimization programming, bipartite graphs, probabilistic or statistical models, information theory, and time series. Section 6.1 has reviewed the motivation, data, objective, history, and softwares of biclustering. In Section 6.2, the bicluster type and biclustering structures are formally defined in a mathematical way. The most recent biclustering algorithms are reviewed in Section 6.3 and a comparison score is also defined. The application of biclustering in computational neuroscience will be reviewed in Section 6.4 and conclusions and future works are in Section 6.5.
6.2 Biclustering Types and Structures
6.2.1 Notations
As mentioned in Section 6.1.2, the expression matrix is mostly used in biclustering. Let \(A=(a_{ij})_{n\times m}\) denote the samplefeature expression matrix, where there are n rows representing n samples, m columns representing m features, and the entry a _{ ij } denoting the expression level of feature j in sample i. Mostly, the matrix A is the required input of an algorithm, but some algorithms also use the space of samples or features.
Let \(\mathcal{S}=\{S_1,S_2,{\cdots},S_n\}\) be the sample set, where \(S_i=(a_{i1},a_{i2},{\cdots},a_{im})\) is also called the feature vector of sample i. Similarly, for the features, it is denoted by \(\mathcal{F}=\{F_1,F_2,{\cdots},F_m\}\) with each vector \(F_j=(a_{1j},a_{2j},{\cdots},a_{nj})^T\), a column vector. Thus, the matrix \(A=(S_1,S_2,{\cdots},S_n)^T=(F_1,F_2,{\cdots},F_m)\).
A bicluster is a submatrix of data matrix. It is denoted by \(B_k=(\mathcal{S}_k,\mathcal{F}_k)\) satisfying that \(\mathcal{S}_k\subseteq \mathcal{S}\), \(\mathcal{F}_k\subseteq \mathcal{F}\) and the entry denotes intersection entry with corresponding row (sample) and column in both A and B _{ k }. Assume that there are K biclusters founded in data matrix A; the set of biclusters is denoted by \(\mathcal{B}=\{B_k: k=1,2,\cdots,K\}\). Sometimes, we use \((\mathcal{S}_k,\mathcal{F})\) to denote a cluster of rows (samples) and use \((\mathcal{S},\mathcal{F}_k)\) a cluster of columns (features). In some algorithms, the number of row clusters is not equal to that of column clusters. Let K,K ′ denote the number of row clusters, column clusters, respectively, the set of biclusters is \(B=\{(\mathcal{S}_k,\mathcal{F}_{k^{\prime}}):\,k=1,{\cdots},K,k^{\prime}=1,{\cdots},K^{\prime}\}\). Without explanation, we assume that \(K=K^{\prime}\).
Additionally, \(\mathcal{S}_k\) denotes the cardinality of itself, i.e., the number of samples in bicluster \(B_k=(\mathcal{S}_k,\mathcal{F}_k)\) while for \(\mathcal{F}_k\), similarly, the number of features. Clearly, \(\mathcal{S}=n,\mathcal{F}=m\). In the following, the notation \(i\in \mathcal{S}_k (j\in \mathcal{F}_k)\) is short for \(S_i\in \mathcal{S}_k (F_j\in \mathcal{F}_k)\) without misleading.
Given a data matrix A, the biclustering problem is to design algorithms to find biclusters \(\mathcal{B}=\{B_k: k=1,2,{\cdots},K\}\) of it, i.e., a subset of matrices of A such that samples (rows, \(\mathcal{S}_k\)) of each bicluster B _{ k } exhibit some similar behavior under the corresponding features (columns, \(\mathcal{F}_k\)). From this point, a bicluster problem now is transformed into a mathematical problem satisfying some requirements (which will be defined in the following under different bicluster types and structures). Usually, after finding biclusters in a data matrix, the rows and columns are rearranged so that the samples/features in a same bicluster will be together, the resulted matrix is called a proper rearrangement matrix. In the following discussions of bicluster types and biclustering structures, the requirements are all based on the rearrangement of data matrix.
6.2.2 Bicluster Types
 1.Bicluster with constant values. For a bicluster \(B_k=(\mathcal{S}_k,\mathcal{F}_k)\), the following identity should be satisfied:where μ is a constant number.$$a_{ij}=\mu, \forall i\in \mathcal{S}_k, \forall j\in \mathcal{F}_k,$$
 2.Bicluster with constant values on rows or columns. For a bicluster \(B_k=(\mathcal{S}_k,\mathcal{F}_k)\) with constant values on rows, the identity for it iswhere μ is a constant and α _{ i } is an adjustment number for row i. The first identity is additive and the second one is multiplicative. Note in some data processing steps, the two are equivalent, for example, if doing logarithmic transformation on the second data matrix case. For the case of constant values on columns, the identity is$$a_{ij}=\mu+\alpha_i,\ \textrm{or}\ a_{ij}=\mu\times \alpha_i, \forall i\in \mathcal{S}_k, \forall j\in \mathcal{F}_k,$$where μ is a constant and β _{ j } is an adjustment number for column j.$$a_{ij}=\mu+\beta_j,\ \textrm{or}\ a_{ij}=\mu\times \beta_j, \forall i\in \mathcal{S}_k, \forall j\in \mathcal{F}_k,$$
 3.Bicluster with coherent values. For a bicluster \(B_k=(\mathcal{S}_k,\mathcal{F}_k)\) with coherent values, there are two transferable expressions. The first one is additive,and the second one is multiplicative,$$a_{ij}=\mu+\alpha_i+\beta_j, \forall i\in \mathcal{S}_k, \forall j\in \mathcal{F}_k,$$The method to transform the second into the first is still doing logarithmic transformation on the second data matrix.$$a_{ij}=\mu\times\alpha_i\times\beta_j, \forall i\in \mathcal{S}_k, \forall j\in \mathcal{F}_k.$$
 4.
Bicluster with coherent evolutions. In the above three cases, the data matrix A ∈ R ^{2}. But for some cases, the algorithms are finding relationships of data on rows or columns without considering the real value. For example, in orderpreserving submatrix (OPSM) algorithm, a bicluster is a group of rows whose values induce a linear order across a subset of columns. Thus, the value of a _{ ij } is not always required in this situation since here the relationships between entries are considered. For other cases, the bicluster with coherent evolutions will be discussed in the following algorithms.
Although the biclusters are classified into these four classes, there are still other forms if the output bicluster was considered to reflect some relationships between the rows and columns within this bicluster. For example, in [7], a δvalid pattern of bicluster is defined to satisfy \(\max(a_{ij})\min(a_{ij})<\delta, \forall j \in \mathcal{F}_k\) for row i.
Besides this, data initialization influences bicluster types, for example, row normalizing a bicluster with constant values on rows (type 2) will result a bicluster constant values (type 1). Similarly, column normalizing a bicluster with constant values on columns (type 2) will result a bicluster constant values (type 1).
6.2.3 Biclustering Structures
The structure of biclustering is defined to be the relationships between biclusters from \(\mathcal{B}=\{B_k=(\mathcal{S}_k,\mathcal{F}_k): k=1,2,{\cdots},K\}\) based on the data matrix A.

Exclusive (nonexclusive). A biclustering structure is said to be row exclusive if \(\mathcal{S}_k \cap \mathcal{S}_{k^{\prime}}=\emptyset\) for any \(k,k^{\prime}\in \{1,{\cdots},K\}, k\neq k^{\prime}\); to be column exclusive if \(\mathcal{F}_k \cap\mathcal{F}_{k^{\prime}}=\emptyset\) for any \(k,k^{\prime} \{1,{\cdots},K\}, k\neq k^{\prime}\); to be exclusive if it is both row exclusive and column exclusive.

Overlapping (nonoverlapping). A biclustering structure is said to be overlapping if some entry a _{ ij } belongs to two or more biclusters; otherwise, it is nonoverlapping.

Exhaustive (nonexhaustive). A biclustering structure is said to be row exhaustive if any row S _{ i } belongs to at least one bicluster; to be column exhaustive if any column F _{ j } belongs to at least one bicluster; to be exhaustive if it is both row and column exhaustive. Otherwise, it is said to be nonexhaustive if some row or column does not belong to any bicluster.
Here, exclusive and overlapping are not opposite to each other, and it can found from structure 7. The following biclustering structures are based on these three properties.
 1.
Single bicluster. In this single biclustering structure, only one submatrix is found, i.e., \(k=1\) and \(\mathcal{B}=\{B_1=(\mathcal{S}_1,\mathcal{F}_1)\}\), from A.
 2.Exclusive row and column biclusters. Given a data matrix A, as Definition 1 in [5], the structure of exclusive row and column biclusters \(\mathcal{B}=\{B_k=(\mathcal{S}_k,\mathcal{F}_k): k=1,2,{\cdots},K\}\) should satisfy the requirements as follows: For rows$$\begin{cases} \mathcal{S}_k\subseteq\mathcal{S},(k=1,{\cdots},K),\\ \mathcal{S}_1\cup \mathcal{S}_2\cup\cdots\cup \mathcal{S}_K=\mathcal{S},\\ \mathcal{S}_k\cap\mathcal{S}_{k^{\prime}}=\emptyset, k,k^{\prime}=1,{\cdots},K, k\neq k^{\prime}, \end{cases}$$(6.1)and for corresponding columns$$\begin{cases} \mathcal{F}_k\subseteq\mathcal{F},(k=1,\cdots,K),\\ \mathcal{F}_1\cup \mathcal{F}_2\cup\cdots\cup \mathcal{F}_K=\mathcal{F},\\ \mathcal{F}_k\cap \mathcal{F}_{k^{\prime}}=\emptyset, k,k^{\prime}=1,\cdots\!\!,K, k\neq k^{\prime}. \end{cases}$$(6.2)
In proper rearrangement of rows and columns of data matrix A, the biclusters are the submatrices in a diagonal way without overlap between any two biclusters.
 3.Checkerboard biclusters. The clusters \({\mathcal{S}_k:k=1,{\cdots},K}\) of samples \(\mathcal{S}\) and the clusters of \({\mathcal{F}_k:k=1,{\cdots},K}\) of features \(\mathcal{F}\) satisfy the same requirements (Equations (6.1) and (6.2)) as in structure 2. The set of checkerboard biclusters is$$\mathcal{B}=\{B_{{kk}^{\prime}}= (\mathcal{S}_k,\mathcal{F}_k^{\prime}):k,k^{\prime}=1,{\cdots},K,\}$$
i.e., any entry of A is in someone's biclusters.
Considering each bicluster as an entry, the proper rearrangement matrix of A is a K × K matrix with entry B _{ k,k’ }. In some cases, the number of samples' clusters \(\mathcal{S}_k\textrm{S}\) do not need to be the same as that of features' clusters \(\mathcal{F}_k\textrm{S}\). This will imply a rectangle not a square matrix.
 4.Exclusive rows biclusters. Given a data matrix A, the structure of exclusive rows' biclusters \(\mathcal{B}=\{B_k=(\mathcal{S}_k,\mathcal{F}_k): k=1,2,{\cdots},K\}\) should satisfy the requirements as follows: For rows$$\begin{cases} \mathcal{S}_k\subseteq\mathcal{S},(k=1,{\cdots},K),\\ \mathcal{S}_1\cup \mathcal{S}_2\cup\cdots\cup \mathcal{S}_K=\mathcal{S},\\ \mathcal{S}_k\cap\mathcal{S}_{k}^{\prime}=\emptyset, k,k^{\prime}=1,{\cdots},K, k\neq k^{\prime}, \end{cases}$$(6.3)and for corresponding columns$$\begin{cases} \mathcal{F}_k\subseteq\mathcal{F},(k=1,{\cdots},K),\\ \mathcal{F}_1\cup \mathcal{F}_2\cup\cdots\cup \mathcal{F}_K=\mathcal{F}. \end{cases}$$(6.4)
Comparing Equations (6.1) and (6.2) in structure 2, requirements for rows are same, but for columns, Equation (6.4) has no disjoint requirement between \(\mathcal{F}_k\) and \(\mathcal{F}_{k^{\prime}},k^{\prime} \neq k\). In this structure, some features (columns) may belong to two or more biclusters (submatrices), while any sample (row) should belong to exactly one bicluster (submatrix).
 5.Exclusive columns biclusters. Given a data matrix A, the structure of exclusive columns' biclusters \(\mathcal{B}=\{B_k=(\mathcal{S}_k,\mathcal{F}_k): k=1,2,{\cdots},K\}\) should satisfy the requirements as follows:For rows$$\begin{cases} \mathcal{S}_k\subseteq\mathcal{S},(k=1,{\cdots},K),\\ \mathcal{S}_1\cup \mathcal{S}_2\cup\cdots\cup \mathcal{S}_K=\mathcal{S}, \end{cases}$$(6.5)and for corresponding columns$$\begin{cases} \mathcal{F}_k\subseteq\mathcal{F},(k=1,{\cdots},K),\\ \mathcal{F}_1\cup \mathcal{F}_2\cup\cdots\cup \mathcal{F}_K=\mathcal{F},\\ \mathcal{F}_k\cap \mathcal{F}_{k^{\prime}}=\emptyset, k,k^{\prime}=1,{\cdots},K, k\neq k^{\prime}. \end{cases}$$(6.6)
Comparing Equations (6.1) and (6.2) in structure 2, requirements for columns are same, but for rows, Equation (6.5) has no disjoint requirement between \(\mathcal{S}_k\) and \(\mathcal{S}_{k^{\prime}},k^{\prime}\neq k\). In this structure, some samples (rows) may belong to two or more biclusters (submatrices), while any feature (column) should belong to exactly one bicluster (submatrix).
 6.
Nonoverlapping with treestructured biclusters. For a data matrix A, nonoverlapping means no entry can belong to more than one bicluster. Thus some entries may not belong to any bicluster. Tree structure means in the proper rearrangement matrix, the blocks of submatrices (biclusters) are not crossing each other.
 7.
Nonoverlapping nonexclusive biclusters. Nonoverlapping is same as above. Nonexclusive means a sample or feature can belong to more than one biclusters, and a sample can belong to two sets of important features in two biclusters, and vice versa.
 8.
Nonoverlapping hierarchically structured biclusters. Nonoverlapping is same as above. Hierarchically structured means a bicluster may belong to some other “bigger” biclusters, i.e., in the set of biclusters \(\mathcal{B}=\{B_k=(\mathcal{S}_k,\mathcal{F}_k)\!: k=1,2,{\cdots},K\!\}\) of data matrix A, there exists some biclusters \(B_k=(\mathcal{S}_k,\mathcal{F}_k)\) and \(B_{k^{\prime}}=(\mathcal{S}_{k^{\prime}},\mathcal{F}_{k^{\prime}})\) such that \(\mathcal{S}_k \subseteq \mathcal{S}_{k^{\prime}}\) or \(\mathcal{F}_{k}\subseteq \mathcal{F}_{k^{\prime}}\).
 9.
Arbitrary positioned overlapping biclusters. In the set of biclusters \(\mathcal{B}=\{B_k=(\mathcal{S}_k,\mathcal{F}_k): k=1,2,{\cdots},K\}\) of data matrix A, there exists some entry a _{ ij } such that \(a_{ij}\in B_k\) and \(a_{ij}\in B_{k^{\prime}}\) with k ≠ k ’. In the meantime, biclusters \(B_k,B_{k^{\prime}}\) may share some common samples or features.
To check the nine biclustering structures, and according to above definitions of exclusive and exhaustive, structures 1, 2 are exclusive; structure 3 is nonoverlapping; structure 1 is nonexhaustive; structures 2, 3, 4, and 5 are exhaustive; and the properties for some other structures can be found from its classification. Note that these structures are not always strict. For example, structures 2, 3, 4, and 5 also have nonexclusive versions (which will not satisfy above formal requirements), and for details we refer to [37].
6.3 Biclustering Techniques and Algorithms
In this section, the biclustering techniques and algorithms are divided into several class based on the methods used for different areas of mathematics, probability, or other optimization methods. Here we are concentrating on mathematical backgrounds.
6.3.1 Based on Matrix Means and Residues
RWC. Angiulli et al. [1] proposed a random walk biclustering algorithm (RWC) based on a greedy technique and enriched with a local search strategy to escape poor local minima. The algorithm starts with an initial random bicluster B _{ k } and searches for a δbicluster by successive transformations of B _{ k }, until a gain function is improved. The transformations consist in the change of membership (called flip or move) of the row/column that leads to the largest increase of the gain function. If a bit is set from 0 to 1 it means that the corresponding sample or feature, which was not included in the bicluster B _{ k }, is added to B _{ k }. Vice versa, if a bit is set from 1 to 0 it means that the corresponding sample or feature is removed from the bicluster.
6.3.2 Based on Matrix Ordering, Reordering, and Decomposition
The following several biclustering algorithms are based on matrix reordering or decomposition.
OPSM. BenDor et al. [3] proposed orderpreserving submatrix algorithm (OPSM) for biclustering. A bicluster is defined as a submatrix that preserves the order of the selected columns for all of the selected rows. In other words, the expression values of the samples within a bicluster induce an identical linear ordering across the selected features. Based on a stochastic model, the authors [3] developed a deterministic algorithm to find large and statistically significant biclusters. This concept has been taken up in a recent study by Liu and Wang [36] as OPcluster.
ISA. Ihmels et al. [30] proposed the iterative signature algorithm (ISA) for biclustering. Given the data matrix A, the two matrices A ^{ s },A ^{ f } are obtained by normalizing A such that \(\sum_i a^s_{ij}=0,\sum_i (a_{ij}^s)^2=1\) (mean, variance) for each feature F _{ j } and similarly for sample S _{ i }, \(\sum_j a_{ij}^f=0,\sum_j (a_{ij}^f)^2=1\).
Starting with an initial set of samples, all features are scored with respect to this sample set and those features are chosen for which the score exceeds a predefined threshold. In the same way, all samples are scored regarding the selected features and a new set of samples is selected based on another predefined threshold. The entire procedure is repeated until the set of samples and the set of features do not change anymore. Multiple biclusters can be identified by running the iterative signature algorithm on several initial sample sets.
xMotif. In the framework proposed by Murali and Kasif [39], biclusters are defined such that samples are nearly constantly expressed across the selection of features. In first step, the input matrix is preprocessed by assigning each sample a set of statistically significant states. These states define the set of valid biclusters: A bicluster is a submatrix where each sample is exactly in the same state for all selected features. To identify the largest valid biclusters, an iterative search method is proposed that is run on different random seeds, similarly to ISA.
OREO. DiMaggio Jr. et al. [19] proposed an algorithm of optimal reordering (OREO) of the rows and columns of the data matrix A to biclustering. The idea of OREO is to optimally rearrange the rows and columns of data matrix A to minimize the similarities between rows and columns in the rearranged matrix. The algorithm has three main iterative steps: optimally reordering rows (or columns) of the data matrix; computing the median for each pair of neighboring rows (or columns) in the final rearranged matrix, sorting these values from highest to lowest and classifying cluster boundaries between the rows (or columns) to obtain submatrices; and optimally reordering the columns (or rows) of each submatrix and computing the cluster boundaries for the reordered columns (or rows) analogous to the second step.
The two optimization problems induced by the models are mixed integer linear programming and can be solved by CPLEX [14].
After reordering the rows of data matrix, for rows i and i + 1 in the final rearranged matrix, the median of each pairwise term of the objective function \(\phi(a_{i,j},a_{i+1,j})\) is computed by \(\textrm{MEDIAN}_j\phi (a_{i,j},a_{i+1,j})\). In [19], top 10% of largest median values are suggested to be boundaries between reordered rows.
nsNMF. PascualMontano et al. [43] and CarmonaSaez et al. [8] proposed a biclustering algorithm based on nonsmooth nonnegative matrix factorization (nsNMF). The method nsNMF approximates the data matrix A as a product of two submatrices, W and H. Rows of H constitute basis samples, while columns of W are basis features. Coefficients in each pair of basis samples and features are used to sort features and samples in the original matrix, respectively. The biclusters are the submatrices of the sorted matrix.
When \(\theta = 0\), the nsNMF backs to NMF; when θ → 1, the vector SX (X is a positive nonzero vector) tends to the constant with all elements almost equal to the average of the elements of X and all entries are equal to the same nonzero value, which is the smoothest possible vector, in the sense of “nonsparseness.” The algorithm to solve this objective function can be done as the same way of previous function with small changes [8].
Bimax. Prelic et al. [44] presented a fastandconquer approach, binary inclusionmaximal biclustering algorithm (Bimax). This algorithm assumes that the data matrix A is binary with \(a_{ij}\in \{0,1\}\) where an entry 1 means feature j is important in sample i.
In this algorithm, a named inclusionmaximal bicluster is defined to be \(B_k=(\mathcal{S}_k,\mathcal{F}_k)\) such that \(a_{ij}=1\) for any \(i \in \mathcal{S}_k,j \in \mathcal{F}_k\), and there does not exist another bicluster \(B_{k}^{\prime}=(\mathcal{S}_{k}^{\prime},\mathcal{F}_{k}^{\prime})\) of A with \(a_{ij}=1\) for any entry in \(B_{k}^{\prime}\) and \(\mathcal{S}_k \subseteq \mathcal{S}_{k}^{\prime},\mathcal{F}_k\subseteq \mathcal{F}_{k}^{\prime},(\mathcal{S}_k,\mathcal{F}_k)\neq(\mathcal{S}_{k}^{\prime},\mathcal{F}_{k}^{\prime})\).
The Bimax algorithm is to find such inclusionmaximal bicluster of A, which is different from the SAMBA, where 0 entry can be contained in a bicluster. More specifically, the idea behind the Bimax algorithm is to partition A into three submatrices, one of which contains only 0cells. Therefore, it can be disregarded in the following. The algorithm is then recursively applied to the remaining two submatrices U and V; the recursion ends if the current matrix represents a bicluster, i.e., contains only 1s. If U and V do not share any rows and columns of A, the two matrices can be processed independently from each other. If U and V have a set of rows in common as shown, special care is necessary to only generate those biclusters in V that share at least one common column.
6.3.3 Based on Bipartite Graphs
The following two algorithms are based on bipartite graphs since there is a close relationship between expression matrix of samples and features and weighted bipartite graph.
A bipartite graph is defined as a graph \(G=(U,V,E)\), where U,V are two disjoint sets of vertices, and E is the set of edges between vertices from U and V, while no edge appears between any two vertices from U or V.
In order to do biclustering problem, the data matrix A can be transformed into a bipartite graph where each vertex in one set U denotes a sample while vertex from another set V denotes a feature. The expression level a _{ ij } between samples and features is denoted by the weighted edges \((u_i,v_j)\in E\) between vertices u _{ i } ∈ U and v _{ j } ∈ V with weight \(w_{ij}=a_{ij}\). A bicluster corresponds to a subgraph \(H_k=(U_k,V_k,E_k)\) of \(G=(U,V,E)\) where \(U_k \subseteq U,V_k \subseteq V\) and \(E_k \subseteq E\) and edges in E _{ k } induced by vertices from U _{ k },V _{ k }. Thus, the set \((\mathcal{S},\mathcal{F},A)\) is corresponding to bipartite graph \(G=(U,V,E)\) and the bicluster \(B_k=(\mathcal{S}_k,\mathcal{F}_k)\) is to subgraph \(H_k=(U_k,V_k,E_k)\). Sometimes, we may only consider one subgraph of G and denote it as \(H=(U ^{\prime},V ^{\prime},E ^{\prime})\). Clearly, here \(U=n,V=m\).
Spectral biclustering. The first algorithm of biclustering based on bipartite graph is called spectral biclustering, proposed by Dhillon [17]. Since this biclustering algorithm has some close relationships, which will be shown later, with spectral graph theory [13], it got its name spectral biclustering. Before presenting this algorithm, several matrices are based on A and bipartite graph \(G=(U,V,E)\) with edges' weight \(w_{ij}=a_{ij}\).
Obviously, the objective of biclustering is to minimize such intersimilarities between biclusters (subgraphs). At the same time, the similarities within each bicluster should be maximized. The intrasimilarity of bicluster(subgraph) is defined as ∑_{ k }. In order to balance the intersimilarities and intrasimilarities of biclusters, several different cuts are defined, such as ratio cut [27, 17, 33], normalized cut [51, 33], minimax cut [60], ICA cut [45]. The most popularly used are ratio cut and normalized cut.
Now the solution of this programming is the eigenvector corresponding to the generalized eigenvalue problem \(Ly=\lambda Dy\) [51]. The above programming problems can be also modeled to mixed integer programming.
For large data matrix A, the solution of its eigenvector problem is very difficult and a method proposed by [17]. For more details of spectral biclustering, see [22]. In above, only two biclusters are obtained instead of K ones. For K biclusters, Dhillon [17] used kmeans algorithm [31, 57] after obtaining the indicator vector y, and another direct approach is from [23] by defining an indicator matrix.
SAMBA. Tanay et al. [54] presented a statistical algorithmic method for bicluster analysis (SAMBA) based on bipartite graph and probabilistic modeling. Under a bipartite graph model, the weight of each edge is assigned according to a probabilistic model, thus, to find biclusters of A become to find heavy subgraphs of G with high likelihood. This method is motivated by finding the complete bipartite subgraph(biclique) of G. The idea of SAMBA has three steps: forming the bipartite graph and calculating weights of edges and nonedges (two models introduced in this step: a simple model and a refined model); applying a hashing technique to find heaviest bicliques(biclusters) in the graph; and performing a local improvement procedure on the biclusters in each heap.
Given a data matrix A, the corresponding bipartite graph is \(G=(U,V,E)\). A bicluster corresponds to a subgraph \(H=(U^{\prime},V^{\prime},E^{\prime})\) as introduced above. The weight of a subgraph is the sum of the assigned weights of edges \((u,v)\in E^{\prime}\) and nonedges \((u,v)\in \bar{E}^{\prime}=(U ^{\prime}\times V ^{\prime})\,\backslash\,E^{\prime}\). The subgraph with assigned weights has its statistical significance and finding a bicluster is to search heavy subgraph with respect to the weight of subgraph. There are two models introduced in [54]: a simple model and a refined model.
In a recent study of Tanay et al. [53], this SAMBA has been extended to integrate multiple types of experimental data.
6.3.4 Based on Information Theory
In [18], Dhillon et al. proposed a biclustering algorithm based on information theory. This information theoretic biclustering algorithm that simultaneously clusters both the rows and the columns is called coclustering by Dhillon et al.
For proof of this result, we refer to [18]. Then an iterative way is used to solve by transformed the objective function [18].
6.3.5 Based on Probability
The following two biclustering algorithms (named as BBC and cMonkey) use the theory of probability.
BBC. Gu and Liu [26] proposed a Bayesian biclustering model (BBC) and implemented a Gibbs sampling [34] procedure for its statistical inference. This model can also consider an implementation of plain model [50] of biclustering.
In nonoverlapping feature biclustering, \(\sum_{k=1}^K\kappa_{jk}\leq 1\), and in nonoverlapping sample biclustering, \(\sum_{k=1}^K\delta_{jk}\leq 1\). Here, nonoverlapping sample is discussed. The priors of the indicators κ and δ are set so that a feature can be in multiple biclusters while sample is at more than one.
In order to calculate the likelihood term in the above ratio, we need to take the inverse and determinant of the covariance matrices for the vector V _{2} in both cases. For details of rest of BBC algorithm, we refer to [26].
cMonkey. Reiss et al. [46] proposed an integrated biclustering algorithm (named cMonkey) used in heterogeneous genomewide data sets for the inference of global regulatory networks. In this model, each bicluster is modeled via a Markov chain process, in which the bicluster is iteratively optimized, and its state is updated based upon conditional probability distributions computed using the cluster's previous state. Three major distinct data types are used (gene expression, upstream sequences, and association networks), and accordingly pvalues for three such model components are computed: the expression component, the sequence component, and the network component. Here we only reviewed the expression component.
Before the following iterative steps, the Markov chain process by which a bicluster is optimized requires “seeding” of the bicluster to start the procedure. The iterative steps include searching for motifs in bicluster, computing conditional probability that each sample/feature is a member of the bicluster, and performing moves sampled from the conditional probability.
6.3.6 Comparison of Biclustering Algorithms
In [44], Prelic et al. used this score to comparing the algorithms of Bimax, CC, OPSM, SAMBA, xMotifs, and ISA with respect to the data set of a metabolic pathway map. And in [12], Cho and Dhillon also use this score to compare several biclustering algorithms on human cancer microarrays data sets.
6.4 Application of Biclustering in Computational Neuroscience
Epilepsy is one of the most common nervous system disorders. It affects about 1% of the world's population with the highest incidence among infants and the elderly [20, 21]. For many years there have been attempts to control epileptic seizures by electrically stimulating the brain [25]. This alternate method of treatment is the subject of much study since the approval of the chronic vagus nerve stimulation (VNS) implant for treatment of intractable seizures [56, 24, 49]. The device consists of an electric stimulator implanted subcutaneously in the chest and connected, via subcutaneous electrical wires, to the left cervical vagus nerve. The VNS is programmed to deliver electrical stimulation at a set intensity, duration, pulse width, and frequency. Optimal parameters are determined on a casebycase basis, depending on clinical efficacy (seizure frequency) and tolerability.
Busygin et al. used supervised consistent biclustering [6] to develop a physiologic marker for optimal VNS parameters (e.g., output current, signal frequency) using measures of scalp EEG signals.
\(\textrm{STL}_{\max}\) values were computed for each scalp EEG channel recorded from two epileptic patients using the algorithm developed by Iasemidis et al. [29]. Then the \(\textrm{STL}_{\max}\) values were used as features of the two data sets. The averaged samples from stimulation periods were then separated from averaged samples from nonstimulation periods by feature selection performed within the consistent biclustering routine.
As each stimulation lasted for 30 s and a 4s time window was used to compute one element of the Lyapunov exponent time series, each stimulation provided seven data points. Since the EEG patterns of a patient may have been changing throughout the observed period due to changes in his/her conditions not relevant to the investigated phenomenon, each of the seven samples across all stimulation cycles were averaged. Thus, seven Lyapunov exponent samples have been created to represent the positive class. To create the negative class, 10 Lyapunov exponent data points were considered 250 s after each stimulation. In the similar way, these 10 samples were averaged across all stimulation cycles. So, the created negative class contains 10 averaged Lyapunov exponent data samples from nonstimulation time intervals.
Then, the biclustering experiment was done on two 26 × 17 matrices representing patients A and B. The patient A data were conditionally biclustering admitting with respect to given stimulation and nonstimulation classes without excluding any features. All but one feature were classified into the nonstimulation class, which indicates that for almost all EEG channels the Lyapunov exponent was consistently decreasing during the stimulation with one channel being the only exception.
Crossvalidation was performed for the obtained biclustering by leaveoneout method examining for each sample whether it would be classified in the appropriate class if the feature selection was performed without it. It turned out that all classes of all 17 samples are confirmed by this method.
The obtained biclustering results allow to assume that signals from certain parts of the brain consistently change their characteristics when VNS is switched on and could provide a basis for desirable VNS stimulation parameters. A physiologic marker of optimal VNS effect could greatly reduce the cost, time, and risk of calibrating VNS stimulation parameters in newly implanted patients compared to the current method of clinical response.
6.5 Conclusions
In this review, the formal definitions of biclustering with its different types and structures are given and the algorithms are reviewed in mathematical prospective.
Biclustering is recently a hot research area with its applications in bioinformatics. Other application areas are text mining, marketing analysis, etc. In practical applications, some problems, such as the data missing, the noise of data, and data processing, influence a lot to the results of biclustering. Besides, the comparisons of biclustering algorithms are still another direction to be studied.
References
 1.Angiulli, F., Cesario, E., Pizzuti, C. Random walk biclustering for microarray data. Inf Sci: Int J 178(6), 1479–1497 (2008)MATHGoogle Scholar
 2.Barkow, S., et al. BicAT: A biclustering analysis toolbox. Bioinformatics 22, 1282–1283 (2006)CrossRefGoogle Scholar
 3.BenDor, A., Chor, B., Karp, R., Yakhini, Z. Discovering local structure in gene expression data: The orderpreserving submatrix problem. J Comput Biol 10, 373–384 (2003)CrossRefGoogle Scholar
 4.Busygin, S., Prokopyev, O.A., Pardalos, P.M. Feature selection for consistent biclustering via fractional 0–1 programming. J Comb Optim 10/1, 7–21 (2005)MathSciNetMATHCrossRefGoogle Scholar
 5.Busygin, S., Prokopyev, O.A., Pardalos, P.M. Biclustering in datamining. Comput Oper Res 35, 2964–2987 (2008)MathSciNetMATHCrossRefGoogle Scholar
 6.Busygin, S., Boyko, N., Pardalos, P., Bewernitz, M., Ghacibehc, G. Biclustering EEG data from epileptic patients treated with vagus nerve stimulation. AIP Conference Proceedings of the Data Mining, Systems Analysis and Optimization in Biomedicine, 220–231 (2007)Google Scholar
 7.Califano, A., Stolovitzky, G., Tu, Y. Analysis of gene expression microarays for phenotype classification. Proceedings of International Conference on Computational Molecular Biology, 75–85 (2000)Google Scholar
 8.CarmonaSaez, P., PascualMarqui, R.D., Tirado, F., Carazo, J.M., PascualMontano, A. Biclustering of gene expression data by nonsmooth nonnegative matrix factorization. BMC Bioinformatics 7, 78 (2006)CrossRefGoogle Scholar
 9.Chaovalitwongse, W.A., Butenko, S., Pardalos, P.M. Clustering Challenges in Biological Networks, World Scientific Publishing, Singapore (2008)Google Scholar
 10.Cheng, K.O., et al. Bivisu: Software tool for bicluster detection and visualization. Bioinformatics 23, 2342–2344 (2007)CrossRefGoogle Scholar
 11.Cheng, Y., Church, G.M. Biclustering of expression data. Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology, 93–103 (2000)Google Scholar
 12.Cho, H., Dhillon, I.S. Coclustering of human cancer microarrays using minimum sumsquared residue coclustering. IEEE/ACM Trans Comput Biol Bioinform 5(3), 385–400 (2008)CrossRefGoogle Scholar
 13.Chung, F.R.K. Spectral graph theory. Conference Board of the Mathematical Sciences, Number 92, American Mathematical Society (1997)Google Scholar
 14.CPLEX: ILOG CPLEX 9.0 Users Manual (2005)Google Scholar
 15.Data Clustering. http://en.wikipedia.org/wiki/Data clustering, access at Dec. 8 (2008)
 16.Data Transformation Steps. http://www.dmg.org/v2–0/Transformations.html, access at Dec. 8 (2008)Google Scholar
 17.Dhillon, I.S. Coclustering documents and words using bipartite spectral graph partitioning. Proceedings of the 7th ACM SIGKDD International Conference on Knowledging Discovery and Data Mining (KDD), 26–29 (2001)Google Scholar
 18.Dhillon, I.S., Mallela, S., Modha, D.S. Information theoretic coclustering. Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 89–98 (2003)Google Scholar
 19.DiMaggio, P.A., McAllister, S.R., Floudas, C.A., Feng, X.J., Rabinowitz, J.D., Rabitz, H.A. Biclustering via optimal reordering of data matrices in systems biology: Rigorous methods and comparative studies. BMC Bioinformatics 9, 458 (2008)CrossRefGoogle Scholar
 20.Engel, J. Jr. Seizures and Epilepsy. F. A. Davis Co., Philadelphia, PA (1989)Google Scholar
 21.Engel, J. Jr., Pedley, T.A. Epilepsy: A Comprehensive Textbook. LippincottRaven, Philadelphia, PA (1997)Google Scholar
 22.Fan, N., Chinchuluun, A., Pardalos, P.M. Integer programming of biclustering based on graph models, In: Chinchuluun, A., Pardalos, P.M., Enkhbat, R. and Tseveendorj, I. (eds.) Optimization and Optimal Control: Theory and Applications, Springer (2009)Google Scholar
 23.Fan, N., Pardalos, P.M. Linear and quadratic programming approaches for the general graph partitioning problem, J Global Optim, DOI 10.1007/s1089800995201, (2010)Google Scholar
 24.Fisher, R.S., Krauss, G.L., Ramsay, E., Laxer, K., Gates, J. Assessment of vagus nerve stimulation for epilepsy: Report of the therapeutics and technology assessment subcommittee of the American academy of neurology. Neurology 49, 293–297 (1997)Google Scholar
 25.Fisher, R.S., Theodore W.H. Brain stimulation for epilepsy. Lancet Neurol 3(2), 111–118 (2004)CrossRefGoogle Scholar
 26.Gu, J., Liu, J.S. Bayesian biclustering of gene expression data. BMC Genom 9(Suppl 1), S4 (2008)Google Scholar
 27.Hagen, L., Kahng, A.B. New spectral methods for ratio cut partitioning and clustering. IEEE Trans ComputerAided Design 11(9), 1074–1085 (1992)CrossRefGoogle Scholar
 28.Hartigan, J.A. Direct clustering of a data matrix. J Am Stat Assoc 67, 123–129 (1972)CrossRefGoogle Scholar
 29.Iasemidis, L.D., Principe, J.C., Sackellares, J.C. Measurement and quantification of spatiotemporal dynamics of human epilepic seizures. In: Akay, M. (ed.) Nonlinear Signal Processing in Medicine, IEEE Press (1999)Google Scholar
 30.Ihmels, J., Friedlander, G., Bergmann, S., Sarig, O., Ziv, Y., Barkai, N. Revealing modular organization in the yeast transcriptional network. Nat Genet 31(4), 370–377 (2002)Google Scholar
 31.Jain, A.K., Murty, M.N., Flynn, P.J. Data clustering: A review. ACM Comput Survey 31(3), 264–323 (1999)CrossRefGoogle Scholar
 32.Kaiser, S., Leisch, F. A toolbox for bicluster analysis in r. Tech. Rep. 028, LudwingMaximiliansUniversitat Mnchen (2008)Google Scholar
 33.Kluger, Y., Basri, R., Chang, J.T., Gerstein, M. Spectral biclustering of microarray cancer data: Coclustering genes and conditions. Genome Res 13, 703–716 (2003)CrossRefGoogle Scholar
 34.Lazzeroni, L., Owen, A. Plaid models for gene expression data. Stat Sinica 12, 61C86 (2002)Google Scholar
 35.Lee, D.D., Seung, H.S. Learning the parts of objects by nonnegative matrix factorization. Nature 401, 788–791 (1999)CrossRefGoogle Scholar
 36.Liu, J., Wang, W. OPcluster: Clustering by tendency in high dimensional space. Proceedings of the Third IEEE International Conference on Data Mining, 187–194 (2003)Google Scholar
 37.Madeira, S.C., Oliveira, A.L. Biclustering algorithms for biological data analysis: A survey. IEEE Trans Comput Biol Bioinform 1(1), 24–45 (2004)CrossRefGoogle Scholar
 38.Madeira, S.C., Oliveira, A.L. A linear time biclustering algorithm for time series gene expression data. Lect Notes Comput Sci 3692, 39–52, (2005)MathSciNetCrossRefGoogle Scholar
 39.Murali, T.M., Kasif, S. Extracting conserved gene expression motifs from gene expression data. Pacific Symp Biocomput 8, 77–88 (2003)Google Scholar
 40.Pardalos, P.M., Busygin, S., Prokopyev, O.A. On biclustering with feature selection for microarray data sets. In: Mondaini, R. (ed.) BIOMAT 2005łinternational Symposium on Mathematical and Computational Biology, pp. 367–378. World Scientific, Singapore (2006)Google Scholar
 41.Pardalos, P.M., Chaovalitwongse, W., Iasemidis, L.D., Sackellares, J.C., Shiau, D.S., Carney, P.R., Prokopyev, O.A., Yatsenko, V.A. Seizure warning algorithm based on optimization and nonlinear dynamics. Math Prog 101(2), 365–385 (2004)MathSciNetMATHCrossRefGoogle Scholar
 42.Pardalos, P.M., Chaovalitwongse, W., Prokopyev, O. Electroencephalogram (EEG) time series classification: Application in epilepsy. Ann Oper Res (2006)Google Scholar
 43.PascualMontano, A., Carazo, J.M., Kochi, K., Lehmann, D., PascualMarqui, R.D. Nonsmooth Nonnegative matrix factorization (nsNMF). IEEE Trans Pattern Anal Mach Intell 28, 403–415 (2006)CrossRefGoogle Scholar
 44.Prelic, A., Bleuler, S., Zimmermann, P., Wille, A., Buhlmann, P., Gruissem, W., Hennig, L., Thiele, L., Zitzler, E. A systematic comparison and evaluation of biclusteringmethods for gene expression data. Bioinformatics 22(9), 1122–1129, (2006)CrossRefGoogle Scholar
 45.Rege, M., Dong, M., Fotouhi, F. Bipartite isoperimetric graph partitioning for data coclustering. Data Min Know Disc 16, 276–312 (2008)MathSciNetCrossRefGoogle Scholar
 46.Reiss, D.J., Baliga, N.S., Bonneau, R. Integrated biclustering of heterogeneous genomewide datasets for the inference of global regulatory networks. BMC Bioinformatics 7, 280 (2006)CrossRefGoogle Scholar
 47.Richards, A.L., Holmans, P.A., O'Donovan, M.C., Owen, M.J., Jones, L. A comparison of four clustering methods for brain expression microarray data. BMC Bioinformatics 9, 490 (2008)CrossRefGoogle Scholar
 48.Santamaria, R., Theron, R., Quintales, L. BicOverlapper: A tool for bicluster visualization Rodrigo. Bioinformatics 24, 1212–1213 (2008)CrossRefGoogle Scholar
 49.Schachter, S.C., Wheless, J.W. (eds.) Vagus nerve stimulation therapy 5 years after approval: A comprehensive update. Neurology S4, 59 (2002)Google Scholar
 50.Sheng, Q., Moreau, Y., De Moor, B. Biclustering microarray data by Gibbs sampling. Bioinformatics 19, 196–205 (2003)CrossRefGoogle Scholar
 51.Shi, J., Malik, J. Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell, 22(8), 888–905 (2000)CrossRefGoogle Scholar
 52.Supper, J., Strauch, M., Wanke, D., Harter, K., Zell, A. EDISA: Extracting biclusters from multiple timeseries of gene expression profiles. BMC Bioinformatics 8, 334 (2007)CrossRefGoogle Scholar
 53.Tanay, A., Sharan, R., Kupiec, M., Shamir, R. Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. Proc Natl Acad Sci USA 101, 2981–2986 (2004)CrossRefGoogle Scholar
 54.Tanay, A., Sharan, R., Shamir, R. Discovering statistically significant bilcusters in gene expression data. Bioinformatics 18, S136–S144 (2002)Google Scholar
 55.Tanay, A., Sharan, R., Shamir, R. Biclustering algorithms: A survey. In: Aluru, S. (ed.) Handbook of Computational Molecular Biology. Chapman Hall, London (2005)Google Scholar
 56.Uthman, B.M., Wilder, B.J., Penry, J.K., Dean, C., Ramsay, R.E., Reid, S.A., Hammond, E.J., Tarver, W.B., Wernicke, J.F. Treatment of epilepsy by stimulation of the vagus nerve. Neurology 43, 1338–1345 (1993)CrossRefGoogle Scholar
 57.Xu, R., Wunsch, D. II. Survey of clustering algorithms. IEEE Trans Neural Netw 16(3), 645–678 (2005)CrossRefGoogle Scholar
 58.Yang, J., Wang, W., Wang, H., Yu, P. δ Clusters: Capturing subspace correlation in a large data set. Proceedings of the 18th IEEE International Conference on Data Engineering, 517–528 (2002)Google Scholar
 59.Yang, J., Wang, W., Wang, H., Yu, P. Enhanced biclustering on expression data. Proceedings of the Third IEEE Conference on Bioinformatics and Bioengineering, 321–327 (2003)Google Scholar
 60.Zha, H., He, X., Ding, C., Simon, H., Gu, M. Bipartite graph partitioning and data clustering. Proceedings of the Tenth International Conference on Information and Knowledge Management, 25–32 (2001)Google Scholar
 61.Zhao, H., Liew, A.W.C., Xie, X., Yan, H. A new geometric biclustering based on the Hough transform for analysis of largescale microarray data. J Theor Biol 251, 264–274 (2008)MathSciNetCrossRefGoogle Scholar