Subspace Clustering of Microarray Data Based on Domain Transformation

Jun, Jongeun; Chung, Seokkyung; McLeod, Dennis

doi:10.1007/11960669_3

Subspace Clustering of Microarray Data Based on Domain Transformation

Jongeun Jun²¹,
Seokkyung Chung²² &
Dennis McLeod²¹

Conference paper

517 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4316))

Abstract

We propose a mining framework that supports the identification of useful knowledge based on data clustering. With the recent advancement of microarray technologies, we focus our attention on gene expression datasets mining. In particular, given that genes are often co-expressed under subsets of experimental conditions, we present a novel subspace clustering algorithm. In contrast to previous approaches, our method is based on the observation that the number of subspace clusters is related with the number of maximal subspace clusters to which any gene pair can belong. By performing discretization to gene expression profiles, the similarity between two genes is transformed as a sequence of symbols that represents the maximal subspace cluster for the gene pair. This domain transformation (from genes into gene-gene relations) allows us to make the number of possible subspace clusters dependent on the number of genes. Based on the symbolic representations of genes, we present an efficient subspace clustering algorithm that is scalable to the number of dimensions. In addition, the running time can be drastically reduced by utilizing inverted index and pruning non-interesting subspaces. Experimental results indicate that the proposed method efficiently identifies co-expressed gene subspace clusters for a yeast cell cycle dataset.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of ACM SIGMOD International Conference on Management of Data (1998)
Google Scholar
The Gene Ontology Consortium. Creating the gene ontology resource: design and implementation. Genome Research 11(8), 1425–1433 (2001)
Google Scholar
Cho, R.J., Campbell, M.J., Winzeler, E.A., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T.G., Gabrielian, A.E., Landsman, D., Lockhart, D.J., Davis, R.W.: A genome-wide transcriptional analysis of the mitotic cell cycle. Molecular Cell 2, 5–73 (1998)
Article Google Scholar
Chung, S., Jun, J., McLeod, D.: Mining gene expression datasets using density-based clustering. In: Proceedings of ACM CIKM International Conference on Information and Knowledge Management (2004)
Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001)
MATH Google Scholar
Gasch, A., Eisen, M.: Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biology 3(11), 1–22 (2002)
Article Google Scholar
Golub, T.R., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(15), 531–537 (1999)
Article Google Scholar
Jiang, D., Tang, C., Zhang, A.: Cluster analysis for gene expression data: a survey. IEEE Transactions on Knowledge and Data Engineering 16(11), 1370–1386 (2004)
Article Google Scholar
Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: a review. ACM SIGKDD Explorations Newsletter 6(1), 90–105 (2004)
Article Google Scholar
Salton, G., McGill, M.J.: Introduction to modern information retrieval. McGraw-Hill, New York (1983)
MATH Google Scholar
Tamayo, P., et al.: Interpreting patterns of gene expression with self organizing maps. Proceedings of National Academy of Science 96(6), 2907–2912 (1999)
Article Google Scholar
Tang, C., Zhang, A., Pei, J.: Mining phenotypes and informative genes from gene expression data. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2003)
Google Scholar
Wang, H., Wang, W., Yang, J., Yu, P.S.: Clustering by pattern similarity in large data sets. In: Proceedings of ACM SIGMOD International Conference on Management of Data (2002)
Google Scholar
Xu, Y., Olman, V., Xu, D.: Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees. Bioinformatics 18(4), 536–545 (2002)
Article Google Scholar
Zaki, M.J., Peters, M.: CLICKS: Mining subspace clusters in categorical data via K-partite maximal cliques. In: Proceedings of International Conference on Data Engineering (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Southern California, Los Angeles, CA, 90089, USA
Jongeun Jun & Dennis McLeod
Yahoo! Inc., 2821 Mission College Blvd, Santa Clara, CA, 95054, USA
Seokkyung Chung

Authors

Jongeun Jun
View author publications
You can also search for this author in PubMed Google Scholar
Seokkyung Chung
View author publications
You can also search for this author in PubMed Google Scholar
Dennis McLeod
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Informatics, Indiana University, 901 E. 10th Street, 47408, Bloomington, IN,
Mehmet M. Dalkilic & Sun Kim &
EECS Department, Case Western Reserve Univ., 10900 Euclid Ave, 44106, Cleveland, OH, USA
Jiong Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jun, J., Chung, S., McLeod, D. (2006). Subspace Clustering of Microarray Data Based on Domain Transformation. In: Dalkilic, M.M., Kim, S., Yang, J. (eds) Data Mining and Bioinformatics. VDMB 2006. Lecture Notes in Computer Science(), vol 4316. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11960669_3

Download citation

DOI: https://doi.org/10.1007/11960669_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68970-6
Online ISBN: 978-3-540-68971-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics