Skip to main content

Distance Based Feature Selection for Clustering Microarray Data

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4947))

Abstract

In microarray data, clustering is the fundamental task for separating genes into biologically functional groups or for classifying tissues and phenotypes. Recently, with innovative gene expression microarray data technologies, thousands of expression levels of genes (features) can be measured simultaneously in a single experiment. The large number of genes with a lot of noise causes high complexity for cluster analysis. This challenge has raised the demand for feature selection – an effective dimensionality reduction technique that removes noisy features. In this paper we propose a novel filter method for feature selection. The suggested method, called ClosestFS, is based on a distance measure. For each feature, the distance is evaluated by computing its impact on the histogram for the whole data. Our experimental results show that the quality of clustering results (evaluated by several widely used measures) of K-means algorithm using ClosestFS as the pre-processing step is significantly better than that of the pure K-means.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C., Procopiuc, C., Wolf, J.L., Yu, P.S., Park, J.S.: Fast algorithms for projected clustering. In: Proc. of ACM SIGMOD (1999)

    Google Scholar 

  2. Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proc. of ACM SIGMOD (1998)

    Google Scholar 

  3. Dash, M., Gopalkrishnan, V.: Distance Based Feature Selection for Clustering Microarray Data, Technical Report, School of Computer Engineering, Nanyang Technological University, Singapore (March 2007)

    Google Scholar 

  4. Dash, M., Gopalkrishnan, V.: Two Way Focused Classification. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2007. LNCS, vol. 4654, Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  5. Devaney, M., Ram, A.: Efficient feature selection in conceptual clustering. In: Proc. of ICML (1997)

    Google Scholar 

  6. Dy, J.G., B.C.E.: Visualization and interactive feature selection for unsupervised data. In: Proc. of ACM SIGKDD (2000)

    Google Scholar 

  7. Fisher, D.H.: Knowledge acquisition via incremental conceptual clustering. Machine Learning 2, 139–172 (1987)

    Google Scholar 

  8. Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification 2, 193–218 (1985)

    Article  Google Scholar 

  9. Kim, Y.S., Street, W.N., Menczer, F.: Feature selection in unsupervised learning via evolutionary search. In: Proc. of ACM SIGKDD (2000)

    Google Scholar 

  10. Luo, F., Khan, L., Bastani, F., Yen, I.-L., Zhou, J.: A dynamically growing self-organizing tree (DGSOT) for hierarchical clustering gene expression profiles. Bioinformatics 20, 2605–2617 (2004)

    Article  Google Scholar 

  11. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math 20, 53–65 (1987)

    Article  MATH  Google Scholar 

  12. Sharan, R., Shamir, R.: CLICK: A Clustering Algorithm with Applications to Gene Expression Anaysis. In: Proc. of ISMB, pp. 307–316 (2000)

    Google Scholar 

  13. Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E., Golub, T.R.: Interpreting patterns of gene expression with self-organizing map: Methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. USA 96, 2907–2912 (1999)

    Article  Google Scholar 

  14. Vaithyanathan, S., Dom, B.: Model selection in unsupervised learning with applications to document clustering. In: Proc. of ICML (1999)

    Google Scholar 

  15. Xing, E.P., Karp, R.M.: CLIFF: clustering of high-dimensional microarray data via iterative feature filtering using normalized cuts. Bioinformatics 17, 306–315 (2001)

    Article  Google Scholar 

  16. Yu, L., Liu, H.: Redundancy based feature selection for microarray data. In: Proc. of KDD, pp. 737–742 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Jayant R. Haritsa Ramamohanarao Kotagiri Vikram Pudi

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dash, M., Gopalkrishnan, V. (2008). Distance Based Feature Selection for Clustering Microarray Data. In: Haritsa, J.R., Kotagiri, R., Pudi, V. (eds) Database Systems for Advanced Applications. DASFAA 2008. Lecture Notes in Computer Science, vol 4947. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78568-2_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78568-2_41

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78567-5

  • Online ISBN: 978-3-540-78568-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics