Abstract
This paper concerns the discovery of Order Preserving Clusters (OP-Clusters) in gene expression data, in each of which a subset of genes induce a similar linear ordering along a subset of conditions. After converting each gene vector into an ordered label sequence. The problem is transferred into finding frequent orders appearing in the sequence set. We propose an algorithm of finding the frequent orders by iteratively Combining the most Frequent Prefixes and Suffixes (CFPS) in a statistical way. We also define the significance of an OP-Cluster. Our method has good scale-up property with dimension of the dataset and size of the cluster. Experimental study on both synthetic datasets and real gene expression dataset shows our approach is very effective and efficient.
Chapter PDF
References
Cheng, Y., Churhc, G.: Biclustering of expression data. In: ISMB 2000, pp. 93–103. ACM Press, New York (2000)
Tanay, A., Sharan, R., Shamir, R.: Discovering statistically significant biclusters in gene expression data. IEEE Transactions on Knowledge and Data Engineering 18, 136–144 (2002)
Wang, H., Wang, W., Yang, J., Yu, P.: Clustering by pattern similarity in large data sets. In: ACM SIGMOD Conference on Management of Data 2002, pp. 394–405 (2002)
Yang, J., Wang, W., Wang, H., Yu, P.: δ-clustering: Capturing subspace correlation in a large data set. In: 18th IEEE Int’l. Conf. Data Eng., pp. 517–528 (2002)
Bleuler, S., Prelic, A., Zitzler, E.: An ea framework for biclustering of gene expression data. In: Congress on Evolutionary Computation 2004, pp. 166–173 (2004)
Cho, H., Dhillon, I.S., Guan, Y., Sra, S.: Minimum sum-squared residue cococlustering of gene expression data. In: Fourth SIAM Int’l. Conf. Data Mining (2004)
Teng, L., Chan, L.: Biclustering gene expression profiles by alternately sorting with weighted correlation coefficient. In: IEEE International Workshop on Machine Learning for Signal Processing’06 (2006)
Madeira, S., Oliveira, A.: Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics 1, 24–45 (2004)
Ben-Dor, A., Chor, B., Karp, R., Yakhini, Z.: Discovering local structure in gene expression data: The order-preserving submatrix problem. In: RECOMB 2002, ACM Press, New York (2002)
Agrawal, R., Srikant, R.: Mining sequential patterns. In: 11th International Conference on Data Engineering, pp. 3–14 (1995)
Han, J., Pei, J., Yin, J.: Mining frequent frequent patterns without candidate generation. In: ISMB’00 ACM SIGMOD Conference on Management of Data 2002, pp. 1–12 (2000)
Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE Transactions on Knowledge and Data Engineering 16, 1424–1440 (2004)
Liu, J., Yang, J., Wang, W.: Biclustering in gene expression data by tendency. In: IEEE Computational Systems Bioinformatics Conference, pp. 182–193. IEEE Computer Society Press, Los Alamitos (2004)
Hipp, J., Guntzer, U., Nakhaeizadeh, G.: Algorithms for association rule mining- a general survey and comparison. SIGKDD Explorations 2, 58–64 (2000)
Bleuler, S., Zitzler, E.: Order preserving clustering over multiple time course experiments. In: EvoBIO 2005, pp. 33–43 (2005)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: 20th Int’l. Conf. Very Large Data Bases, pp. 487–499 (1994)
Chen, Y., Radmacher, M., Bittner, M., Simon, R., Meltzer, P.: Gene expression profiles in hereditary breast cancer. NEJM 344, 539–548 (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Teng, L., Chan, L. (2007). Order Preserving Clustering by Finding Frequent Orders in Gene Expression Data. In: Rajapakse, J.C., Schmidt, B., Volkert, G. (eds) Pattern Recognition in Bioinformatics. PRIB 2007. Lecture Notes in Computer Science(), vol 4774. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75286-8_22
Download citation
DOI: https://doi.org/10.1007/978-3-540-75286-8_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75285-1
Online ISBN: 978-3-540-75286-8
eBook Packages: Computer ScienceComputer Science (R0)