Determining distinct clusters in gene expression data using similarity in principal component subspaces

Jonnalagadda, Sudhakar; Srinivasan, Rajagopalan

doi:10.1007/s12572-012-0055-1

Determining distinct clusters in gene expression data using similarity in principal component subspaces

Published: 25 April 2012

Volume 4, pages 41–51, (2012)
Cite this article

International Journal of Advances in Engineering Sciences and Applied Mathematics Aims and scope Submit manuscript

Sudhakar Jonnalagadda¹ &
Rajagopalan Srinivasan¹

141 Accesses
Explore all metrics

Abstract

Clustering is routinely used in gene expression data analysis to mine groups of co-expressed genes. Commonly used clustering algorithms require the user to specify the number of clusters a priori. We have developed a method that identifies, from a set of candidate partitions, the one with the maximal number of distinct clusters. Principal component analysis is used to characterize each cluster by its dominant eigenvectors that describe the correlation between the constituent genes. Similarity between each pair of clusters is measured as the angle between their principal component subspaces. A cluster is deemed to be ‘distinct’ if it shows low similarity to all other clusters in that partition. The method assigns each candidate partition a cumulative measure of the distinctness of all the clusters, called the Net Principal Subspace Information (NEPSI) Index. A candidate partition with the highest NEPSI index value has the maximal number of distinct clusters and is selected as the ‘best’. We illustrate the efficacy of the proposed method using two gene expression datasets and two different clustering algorithms—k-means and model-based clustering. A comparison of the results with those from Bayesian Information Criterion is also given.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Principal components analysis and the reported low intrinsic dimensionality of gene expression microarray data

Article Open access 02 June 2016

Clustering

Optimal dimensionality selection for independent component analysis of transcriptomic data

Article Open access 08 December 2021

References

Jiang, D., Tang, C., Zhang, A.: Cluster analysis for gene expression data: a survey. IEEE Trans. Knowl. Data Eng. 16, 1370–1386 (2004)
Article Google Scholar
Horimoto, K., Toh, H.: Statistical estimation of cluster boundaries in gene expression profile data. Bioinformatics 17, 1143–1151 (2001)
Article Google Scholar
Lukashin, A.V., Fuchs, R.: Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters. Bioinformatics 17, 405–414 (2001)
Article Google Scholar
Yeung, K.Y., Fraley, C., Murua, A., Raftery, A.E., Ruzzo, W.L.: Model-based clustering and data transformations for gene expression data. Bioinformatics 17, 977–987 (2001)
Article Google Scholar
Wicker, N., Dembele, D., Raffelsberger, W., Poch, O.: Density of points clustering, application to transcriptomic data analysis. Nucleic Acids Res. 30, 3992–4000 (2002)
Article Google Scholar
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17, 107–145 (2001)
Article MATH Google Scholar
Bolshakova, N., Azuaje, F.: Cluster validation techniques for genome expression data. Signal Process. 83, 825–833 (2003)
Article MATH Google Scholar
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Article MATH Google Scholar
Dunn, J.: Well separated clusters and optimal fuzzy partitions. J. Cybern. 4, 95–104 (1974)
Article MathSciNet Google Scholar
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1, 224–227 (1979)
Article Google Scholar
Jonnalagadda, S., Srinivasan, R.: An information theory approach for validating clusters in microarray data. In Proceedings of the 12th Intelligent Systems for Molecular Biology, July 31–August 4, 2004. Glasgow, UK. http://www.iscb.org/ismbeccb2004/short%20papers/39.pdf (2004)
Jackson, J.E.: A User’s Guide to Principal Components. Wiley, NY (1991)
Book MATH Google Scholar
Krzanowski, W.J.: Between-groups comparison of principal components. J. Am. Stat. Assoc. 74, 703–707 (1979)
MathSciNet MATH Google Scholar
Singhal, A., Seborg, D.E.: Pattern matching in historical batch data using PCA. IEEE Control Syst. Mag. 22, 53–63 (2002)
Article Google Scholar
Srinivasan, R., Wang, C., Ho, W.K., Lim, K.W.: Dynamic principal component analysis based methodology for clustering process states in agile chemical plants. Ind. Eng. Chem. Res. 43, 2123–2139 (2004)
Article Google Scholar
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27379–27423 and 623–656 (1948)
Fuhrman, S., Cunningham, M.J., Wen, X., Zweiger, G., Seilhamer, J.J., Somogyi, R.: The application of Shannon entropy in the identification of putative drug targets. BioSystems 55, 5–14 (2000)
Article Google Scholar
Li, H., Zhang, K., Jiang, T.: Minimum entropy clustering and applications to gene expression data. Proceedings of IEEE Computational Systems Bioinformatics Conference (CSB’04), pp. 142–151 (2004)
Fraley, C., Raftery, A.E.: Mclust: software for model-based cluster analysis. J. Classif. 16, 297–306 (1999)
Article MATH Google Scholar
Kass, R.E., Raftery, A.E.: Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995)
MATH Google Scholar
Cho, R.J., Campbell, M.J., Winzeler, E.A., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T.G., Gabrielian, A.E., Landsman, D., Lockhart, D.J., Davis, R.W.: A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Biol. Cell 2, 65–73 (1998)
Google Scholar
Sharan, R., Adi, Moron.-Katz., Shamir, R.: CLICK and EXPANDER: a system for clustering and visualizing gene expression data. Bioinformatics 19, 1787–1799 (2003)
Article Google Scholar
Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95, 14863–14868 (1998)
Article Google Scholar
Chu, S., DeRisi, J., Eisen, M., Mulholland, J., Bostein, D., Brown, P.O., Herskowitz, I.: The transcriptional program of Sporulation in budding yeast. Science 282, 699–705 (1998)
Article Google Scholar
Gibbons, D.F., Roth, F.: Judging the quality of gene expression-based clustering methods using gene annotation. Genome Res. 12, 1574–1581 (2002)
Article Google Scholar
Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Davis, A.P., Dolinski, K., Dwight, S.S., et al.: Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000)
Article Google Scholar
Issel-Tarver, L., Christie, K., Dolinski, K., Andrada, R., Balakrishnan, R., Ball, C.A., Binkley, G., Dong, S., Dwight, S.S., Fisk, D.G.: Saccharomyces, genome database. Methods Enzymol. 350, 329–346 (2002)
Article Google Scholar
Draghici, S.: Data Analysis Tools for DNA Microarrays. Chapman and Hall/CRC, Boca Raton (2003)
Book Google Scholar

Download references

Author information

Authors and Affiliations

Department of Chemical and Biomolecular Engineering, National University of Singapore, 10 Kent Ridge Crescent, Singapore, 119260, Singapore
Sudhakar Jonnalagadda & Rajagopalan Srinivasan

Authors

Sudhakar Jonnalagadda
View author publications
You can also search for this author in PubMed Google Scholar
Rajagopalan Srinivasan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rajagopalan Srinivasan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jonnalagadda, S., Srinivasan, R. Determining distinct clusters in gene expression data using similarity in principal component subspaces. Int J Adv Eng Sci Appl Math 4, 41–51 (2012). https://doi.org/10.1007/s12572-012-0055-1

Download citation

Published: 25 April 2012
Issue Date: June 2012
DOI: https://doi.org/10.1007/s12572-012-0055-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Determining distinct clusters in gene expression data using similarity in principal component subspaces

Abstract

Access this article

Similar content being viewed by others

Principal components analysis and the reported low intrinsic dimensionality of gene expression microarray data

Clustering

Optimal dimensionality selection for independent component analysis of transcriptomic data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Determining distinct clusters in gene expression data using similarity in principal component subspaces

Abstract

Access this article

Similar content being viewed by others

Principal components analysis and the reported low intrinsic dimensionality of gene expression microarray data

Clustering

Optimal dimensionality selection for independent component analysis of transcriptomic data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation