Distance Based Feature Selection for Clustering Microarray Data

Dash, Manoranjan; Gopalkrishnan, Vivekanand

doi:10.1007/978-3-540-78568-2_41

Distance Based Feature Selection for Clustering Microarray Data

Manoranjan Dash¹ &
Vivekanand Gopalkrishnan¹

Conference paper

1032 Accesses
11 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4947))

Abstract

In microarray data, clustering is the fundamental task for separating genes into biologically functional groups or for classifying tissues and phenotypes. Recently, with innovative gene expression microarray data technologies, thousands of expression levels of genes (features) can be measured simultaneously in a single experiment. The large number of genes with a lot of noise causes high complexity for cluster analysis. This challenge has raised the demand for feature selection – an effective dimensionality reduction technique that removes noisy features. In this paper we propose a novel filter method for feature selection. The suggested method, called ClosestFS, is based on a distance measure. For each feature, the distance is evaluated by computing its impact on the histogram for the whole data. Our experimental results show that the quality of clustering results (evaluated by several widely used measures) of K-means algorithm using ClosestFS as the pre-processing step is significantly better than that of the pure K-means.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C.C., Procopiuc, C., Wolf, J.L., Yu, P.S., Park, J.S.: Fast algorithms for projected clustering. In: Proc. of ACM SIGMOD (1999)
Google Scholar
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proc. of ACM SIGMOD (1998)
Google Scholar
Dash, M., Gopalkrishnan, V.: Distance Based Feature Selection for Clustering Microarray Data, Technical Report, School of Computer Engineering, Nanyang Technological University, Singapore (March 2007)
Google Scholar
Dash, M., Gopalkrishnan, V.: Two Way Focused Classification. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2007. LNCS, vol. 4654, Springer, Heidelberg (2007)
Chapter Google Scholar
Devaney, M., Ram, A.: Efficient feature selection in conceptual clustering. In: Proc. of ICML (1997)
Google Scholar
Dy, J.G., B.C.E.: Visualization and interactive feature selection for unsupervised data. In: Proc. of ACM SIGKDD (2000)
Google Scholar
Fisher, D.H.: Knowledge acquisition via incremental conceptual clustering. Machine Learning 2, 139–172 (1987)
Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification 2, 193–218 (1985)
Article Google Scholar
Kim, Y.S., Street, W.N., Menczer, F.: Feature selection in unsupervised learning via evolutionary search. In: Proc. of ACM SIGKDD (2000)
Google Scholar
Luo, F., Khan, L., Bastani, F., Yen, I.-L., Zhou, J.: A dynamically growing self-organizing tree (DGSOT) for hierarchical clustering gene expression profiles. Bioinformatics 20, 2605–2617 (2004)
Article Google Scholar
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math 20, 53–65 (1987)
Article MATH Google Scholar
Sharan, R., Shamir, R.: CLICK: A Clustering Algorithm with Applications to Gene Expression Anaysis. In: Proc. of ISMB, pp. 307–316 (2000)
Google Scholar
Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E., Golub, T.R.: Interpreting patterns of gene expression with self-organizing map: Methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. USA 96, 2907–2912 (1999)
Article Google Scholar
Vaithyanathan, S., Dom, B.: Model selection in unsupervised learning with applications to document clustering. In: Proc. of ICML (1999)
Google Scholar
Xing, E.P., Karp, R.M.: CLIFF: clustering of high-dimensional microarray data via iterative feature filtering using normalized cuts. Bioinformatics 17, 306–315 (2001)
Article Google Scholar
Yu, L., Liu, H.: Redundancy based feature selection for microarray data. In: Proc. of KDD, pp. 737–742 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Nanyang Technological University, 50 Nanyang Avenue, Singapore
Manoranjan Dash & Vivekanand Gopalkrishnan

Authors

Manoranjan Dash
View author publications
You can also search for this author in PubMed Google Scholar
Vivekanand Gopalkrishnan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Jayant R. Haritsa Ramamohanarao Kotagiri Vikram Pudi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dash, M., Gopalkrishnan, V. (2008). Distance Based Feature Selection for Clustering Microarray Data. In: Haritsa, J.R., Kotagiri, R., Pudi, V. (eds) Database Systems for Advanced Applications. DASFAA 2008. Lecture Notes in Computer Science, vol 4947. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78568-2_41

Download citation

DOI: https://doi.org/10.1007/978-3-540-78568-2_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78567-5
Online ISBN: 978-3-540-78568-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics