Abstract
Clustering is a method of unsupervised learning, and a common technique for statistical data analysis used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics. Recently, biclustering (or co-clustering), performing simultaneous clustering on the row and column dimensions of the data matrix, has been shown to be remarkably effective in a variety of applications. In this paper we propose a novel approach to biclustering gene expression data based on Modular Singular Value Decomposition (Mod-SVD). Instead of applying SVD directly on on data matrix, the proposed approach computes SVD on modular fashion. Experiments conducted on synthetic and real dataset demonstrated the effectiveness of the algorithm in gene expression data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Barkow, S., Bleuler, S., Prelic, A., Zimmermann, P., Zitzler, E.: Bicat: A biclustering analysis toolbox. Bioinformatics 19, 1282–1283 (2006)
Ben-Dor, A., Chor, B., Karp, R., Yakhini, Z.: Discovering local structure in gene expression data: the order-preserving sub-matrix problem. In: Proceedings of the Sixth Annual International Conference on Computational Biology, pp. 49–57. ACM Press, New York (2002)
Blucke, Leemput, Naudts, Remortel, Ma, Verschoren, Moor, Marchal: Syntren: a generator of synthetic gene expression data for design and analysis of structure learning algorithm. BMC Bioinformatics 7, 1–16 (2006)
Cano, C., Adarve, L., López, J., Blanco, A.: Possibilistic approach for biclustering microarray data. Computers in Biology and Medicine 37, 1426–1436 (2007)
Cheng, Y., Church: Biclustering of expression data. In: Proceedings of the Intl Conf. on intelligent Systems and Molecular Biology, pp. 93–103 (2000)
Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Trans. on Evolutionary Computatation 6, 182–197 (2002)
Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the 7th ACM SIGKDD, pp. 269–274 (2001)
Prelic, A., et al.: A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22, 1122–1129 (2006)
Filippone, M., Masulli, F., Rovetta, S., Zini, L.: Comparing fuzzy approaches to biclustering. In: Proceedings of International Meeting on Computational Intelligence for Bioinformatics and Biostatistics, CIBB (2008)
Filippone, M., Masulli, F., Stefano, R.: Possibilistic approach to biclustering: An application to oligonucleotide microarray data analysis. In: Proceedings of the Computational Methods in System Biology, pp. 312–322 (2006)
Gan, X., Alan, Yan, H.: Discovering biclusters in gene expression data based on high dimensional linear geometries. BMC Bioinformatics 9, 209–223 (2008)
Getz, G., Levine, E., Domany, E.: Coupled two-way clustering analysis of gene microarray data. In: Proceedings of National Academy of Science, 12079–12084 (2000)
Hartigan, J.A.: direct clustering of a data matrix. Journal of the American Statistical Association 67, 123–129 (1972)
Hastie, T., Levine, E., Domany, E.: ’Gene shaving’ as a method for identifying distinct set of genes with similar expression patterns. Genome Biology 1, 0003.1–0003.21 (2000)
Mallela, S., Dhillon, I., Modha, D.: Information-theoretic co-clustering. In: In Proceedings of the 9th International Conference on Knowledge Discovery and Data Mining (KDD), pp. 89–98 (2003)
Ihmels, J., Bergmann, S., Barkai, N.: Defining transcription modules using large-scale gene expression data. Bioinformatics 20, 1993–2003 (2004)
Jain, A.K., Murthy, M.N., Flynn, P.J.: Data clustering: A review. ACM Computing Surveys 31, 264–323
Kluger, Y., Basri, Chang, Gerstein: Spectral biclustering of microarray data: Coclustering genes and conditions. Genome Research 13, 703–716 (2003)
Lay, D.C.: Linear Algebra and its Applications. Addison-Wesley, Reading (2002)
Li, Z., Lu, X., Shi, W.: Process variation dimension reduction based on svd. In: Proceedings of the Intl Symposium on Circuits and Systems, pp. 672–675 (2003)
Liu, X., Wang, L.: Computing the maximum similarity bi-clusters of gene expression data. Bioinformatics 23, 50–56 (2007)
Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: A survey. IEEE & ACM Trans. on Computational Biology and Bioinformatics 1, 24–45 (2004)
Mitra, S., Banka, H.: Mulit-objective evolutionary biclustering of gene expression data. Pattern Recognition 39, 2464–2477 (2006)
Mitra, S., Banka, H.: Multi-objective evolutionary biclustering of gene expression data. Pattern Recognition 39, 2464–2477 (2006)
Orr, S.: Network motifs in the transcriptional regulation network of escherichia coli. Nature Genetics 31, 64–68 (2002)
Rohwer, R., Freitag, D.: Towards full automation of lexicon construction. In: HLT-NAACL 2004: Workshop on Computational Lexical Semantics, pp. 9–16 (2004)
Tanay, A., Sharan, R., Shamir, R.: Discovering statistically significant biclusters in gene expression data. Bioinformatics 18, 136–144 (2002)
Tang, C., Zhang, L., Zhang, A., Ramanathan, M.: Interrelated two-way clustering: an unsupervised approach for gene expression data analysis. In: Proceedings of the Second Annual IEEE International Symposium on Bioinformatics and Bioengineering, BIBE, pp. 41–48 (2001)
Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., Church, G.M.: Systematic determination of genetic network architecture. Nature Genetics 22, 281–285 (1999)
Tjhi, W.C., Chen, L.: A partitioning based algorithm to fuzzy co-cluster documents and words. Pattern Recognition Letters 27, 151–159 (2006)
Yang, J., Wang, W., Wang, H., Yu, P.: δ-cluster: capturing subspace correlation in a large data set. In: Proceedings of the 18th IEEE International Conference Data Engineering, pp. 517–528 (2002)
Yang, J., Wang, W., Wang, H., Yu, P.: Enhanced biclustering on expression data. In: Proceedings of the Third IEEE Conference on Bioinformatics and Bioengineering, pp. 321–327 (2003)
Zhang, Z., Teo, A., Ooi, B.: Mining deterministic biclusters in gene expression data. In: Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering, p. 283 (2004)
Zhao, H., Alan, Xie, X., Yan, H.: A new geometric biclustering algorithm based on the Hough transform for analysis of large scale microarray data. Journal of Theoretical Biology 251, 264–274 (2008)
Zhao, H., Yan, H.: Hough feature, a novel method for assessing drug effects in three-color cdna microarray experiments. BMC Bioinformatics 8, 256 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Aradhya, V.N.M., Masulli, F., Rovetta, S. (2010). A Novel Approach for Biclustering Gene Expression Data Using Modular Singular Value Decomposition. In: Masulli, F., Peterson, L.E., Tagliaferri, R. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2009. Lecture Notes in Computer Science(), vol 6160. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14571-1_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-14571-1_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14570-4
Online ISBN: 978-3-642-14571-1
eBook Packages: Computer ScienceComputer Science (R0)