# A Robust Biclustering Method Based on Crossing Minimization in Bipartite Graphs

## Abstract

Clustering refers to the process of organizing a set of input vectors into clusters based on similarity defined according to some preset distance measure. In many cases it is more desirable to simultaneously cluster the dimensions as well as the vectors themselves. This special instance of clustering, referred to as *biclustering*, was introduced by Hartigan [3]. It has many applications in areas including data mining, pattern recognition, and computational biology. Considerable attention has been devoted to it from the gene expression data analysis; see [5] for a nice survey. Input is represented in a data matrix, where the rows and columns of the matrix correspond to genes and conditions respectively. Each entry in the matrix reflects the expression level of a gene under a certain condition. From a graph-teoretical perspective the data matrix can be viewed as a weighted bipartite graph, where the vertex set of one partition is the set of genes and the vertex set of the other partition is the set of conditions. An existing weighted edge incident on a *gene-condition* pair reflects the expression level of the gene under that specific experimental condition. The biclustering problem may then be described in terms of the various versions of the biclique extraction problem in bipartite graphs. Many interesting versions that directly apply to the biclustering problem are NP-hard [4]. Various graph-theoretical approaches employing heuristics have been suggested [1,4,6,7].

## References

- 1.Abdullah, A., Hussain, A.: A new biclustering technique based on crossing minimization. Neurocomputing 69(16-18), 1882–1896 (2006)CrossRefGoogle Scholar
- 2.Çakiroglu, O., Erten, C., Karatas, Ö., Sözdinler, M.: Crossing minimization in weighted bipartite graphs. In: Demetrescu, C. (ed.) WEA 2007. LNCS, vol. 4525, pp. 122–135. Springer, Heidelberg (2007)CrossRefGoogle Scholar
- 3.Hartigan, J.A.: Direct clustering of a data matrix. Journal of the American Statistical Association 67(337), 123–129 (1972)CrossRefGoogle Scholar
- 4.Lonardi, S., Szpankowski, W., Yang, Q.: Finding biclusters by random projections. Theor. Comput. Sci. 368(3), 217–230 (2006)CrossRefzbMATHMathSciNetGoogle Scholar
- 5.Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Trans. on Comp. Biol. and Bioinf.(TCBB) 1(1), 24–45 (2004)CrossRefGoogle Scholar
- 6.Mishra, N., Ron, D., Swaminathan, R.: On finding large conjunctive clusters. In: COLT, pp. 448–462 (2003)Google Scholar
- 7.Tanay, A., Sharan, R., Shamir, R.: Discovering statistically significant biclusters in gene expression data. Bioinformatics 18(supagesl. 1), 136–144 (2002)CrossRefGoogle Scholar