Order Preserving Clustering by Finding Frequent Orders in Gene Expression Data

Teng, Li; Chan, Laiwan

doi:10.1007/978-3-540-75286-8_22

Order Preserving Clustering by Finding Frequent Orders in Gene Expression Data

Li Teng¹ &
Laiwan Chan¹

Conference paper

1085 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4774))

Abstract

This paper concerns the discovery of Order Preserving Clusters (OP-Clusters) in gene expression data, in each of which a subset of genes induce a similar linear ordering along a subset of conditions. After converting each gene vector into an ordered label sequence. The problem is transferred into finding frequent orders appearing in the sequence set. We propose an algorithm of finding the frequent orders by iteratively Combining the most Frequent Prefixes and Suffixes (CFPS) in a statistical way. We also define the significance of an OP-Cluster. Our method has good scale-up property with dimension of the dataset and size of the cluster. Experimental study on both synthetic datasets and real gene expression dataset shows our approach is very effective and efficient.

Download to read the full chapter text

Chapter PDF

References

Cheng, Y., Churhc, G.: Biclustering of expression data. In: ISMB 2000, pp. 93–103. ACM Press, New York (2000)
Google Scholar
Tanay, A., Sharan, R., Shamir, R.: Discovering statistically significant biclusters in gene expression data. IEEE Transactions on Knowledge and Data Engineering 18, 136–144 (2002)
Google Scholar
Wang, H., Wang, W., Yang, J., Yu, P.: Clustering by pattern similarity in large data sets. In: ACM SIGMOD Conference on Management of Data 2002, pp. 394–405 (2002)
Google Scholar
Yang, J., Wang, W., Wang, H., Yu, P.: δ-clustering: Capturing subspace correlation in a large data set. In: 18^th IEEE Int’l. Conf. Data Eng., pp. 517–528 (2002)
Google Scholar
Bleuler, S., Prelic, A., Zitzler, E.: An ea framework for biclustering of gene expression data. In: Congress on Evolutionary Computation 2004, pp. 166–173 (2004)
Google Scholar
Cho, H., Dhillon, I.S., Guan, Y., Sra, S.: Minimum sum-squared residue cococlustering of gene expression data. In: Fourth SIAM Int’l. Conf. Data Mining (2004)
Google Scholar
Teng, L., Chan, L.: Biclustering gene expression profiles by alternately sorting with weighted correlation coefficient. In: IEEE International Workshop on Machine Learning for Signal Processing’06 (2006)
Google Scholar
Madeira, S., Oliveira, A.: Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics 1, 24–45 (2004)
Article Google Scholar
Ben-Dor, A., Chor, B., Karp, R., Yakhini, Z.: Discovering local structure in gene expression data: The order-preserving submatrix problem. In: RECOMB 2002, ACM Press, New York (2002)
Google Scholar
Agrawal, R., Srikant, R.: Mining sequential patterns. In: 11^th International Conference on Data Engineering, pp. 3–14 (1995)
Google Scholar
Han, J., Pei, J., Yin, J.: Mining frequent frequent patterns without candidate generation. In: ISMB’00 ACM SIGMOD Conference on Management of Data 2002, pp. 1–12 (2000)
Google Scholar
Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE Transactions on Knowledge and Data Engineering 16, 1424–1440 (2004)
Article Google Scholar
Liu, J., Yang, J., Wang, W.: Biclustering in gene expression data by tendency. In: IEEE Computational Systems Bioinformatics Conference, pp. 182–193. IEEE Computer Society Press, Los Alamitos (2004)
Google Scholar
Hipp, J., Guntzer, U., Nakhaeizadeh, G.: Algorithms for association rule mining- a general survey and comparison. SIGKDD Explorations 2, 58–64 (2000)
Article Google Scholar
Bleuler, S., Zitzler, E.: Order preserving clustering over multiple time course experiments. In: EvoBIO 2005, pp. 33–43 (2005)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: 20^th Int’l. Conf. Very Large Data Bases, pp. 487–499 (1994)
Google Scholar
Chen, Y., Radmacher, M., Bittner, M., Simon, R., Meltzer, P.: Gene expression profiles in hereditary breast cancer. NEJM 344, 539–548 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong
Li Teng & Laiwan Chan

Authors

Li Teng
View author publications
You can also search for this author in PubMed Google Scholar
Laiwan Chan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Jagath C. Rajapakse Bertil Schmidt Gwenn Volkert

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Teng, L., Chan, L. (2007). Order Preserving Clustering by Finding Frequent Orders in Gene Expression Data. In: Rajapakse, J.C., Schmidt, B., Volkert, G. (eds) Pattern Recognition in Bioinformatics. PRIB 2007. Lecture Notes in Computer Science(), vol 4774. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75286-8_22

Download citation

DOI: https://doi.org/10.1007/978-3-540-75286-8_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75285-1
Online ISBN: 978-3-540-75286-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)