Relevant Attribute Discovery in High Dimensional Data Based on Rough Sets and Unsupervised Classification: Application to Leukemia Gene Expressions

Valdés, Julio J.; Barton, Alan J.

doi:10.1007/11548706_38

Relevant Attribute Discovery in High Dimensional Data Based on Rough Sets and Unsupervised Classification: Application to Leukemia Gene Expressions

Julio J. Valdés²² &
Alan J. Barton²²

Conference paper

1532 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3642))

Abstract

A pipelined approach using two clustering algorithms in combination with Rough Sets is investigated for the purpose discovering important combination of attributes in high dimensional data. In many domains, the data objects are described in terms of a large number of features, like in gene expression experiments, or in samples characterized by spectral information. The Leader and several k-means algorithms are used as fast procedures for attribute set simplification of the information systems presented to the rough sets algorithms. The data submatrices described in terms of these features are then discretized w.r.t the decision attribute according to different rough set based schemes. From them, the reducts and their derived rules are extracted, which are applied to test data in order to evaluate the resulting classification accuracy. An exploration of this approach (using Leukemia gene expression data) was conducted in a series of experiments within a high-throughput distributed-computing environment. They led to subsets of genes with high discrimination power. Good results were obtained with no preprocessing applied to the data.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hartigan, J.: Clustering Algorithms. John Wiley & Sons, Chichester (1975)
MATH Google Scholar
Anderberg, M.: Cluster Analysis for Applications. Academic Press, London (1973)
MATH Google Scholar
Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 1(27), 857–871 (1973)
Google Scholar
Chandon, J.L., Pinson, S.: Analyse typologique. Théorie et applications: Masson, Paris (1981)
Google Scholar
Pawlak, Z.: Rough sets: Theoretical aspects of reasoning about data. Kluwer Academic Publishers, Dordrecht (1991)
MATH Google Scholar
Bazan, J.G., Skowron, A., Synak, P.: Dynamic Reducts as a Tool for Extracting Laws from Decision Tables. In: Raś, Z.W., Zemankova, M. (eds.) ISMIS 1994. LNCS, vol. 869, pp. 346–355. Springer, Heidelberg (1994)
Google Scholar
Wróblewski, J.: Ensembles of Classifiers Based on Approximate Reducts. Fundamenta Informaticae 47, 351–360 (2001)
MATH MathSciNet Google Scholar
Valdés, J.J.: Similarity-Based Heterogeneous Neurons in the Context of General Observational Models. Neural Network World 12(5), 499–508 (2002)
Google Scholar
Valdés, J.J.: Virtual Reality Representation of Relational Systems and Decision Rules: An exploratory Tool for understanding Data Structure. In: Hajek, P. (ed.) Theory and Application of Relational Structures as Knowledge Instruments. Meeting of the COST Action, Prague, November 14-16, vol. 274 (2002)
Google Scholar
Borg, I., Lingoes, J.: Multidimensional similarity structure analysis. Springer, New York (1987)
Google Scholar
Sammon, J.W.: A non-linear mapping for data structure analysis. IEEE Trans. on Computers C18, 401–409 (1969)
Article Google Scholar
Golub, T.R., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Article Google Scholar
Øhrn, A., Komorowski, J.: Rosetta- A Rough Set Toolkit for the Analysis of Data. In: Proc. of Third Int. Join Conf. on Information Sciences (JCIS97), Durham, NC, USA, March 1-5, pp. 403–407 (1997)
Google Scholar
Valdés, J.J., Barton, A.J.: Gene Discovery in Leukemia Revisited: A Computational Intelligence Perspective. In: Orchard, B., Yang, C., Ali, M. (eds.) IEA/AIE 2004. LNCS (LNAI), vol. 3029, pp. 118–127. Springer, Heidelberg (2004)
Chapter Google Scholar
Famili, F., Ouyang, J.: Data mining: understanding data and disease modeling. In: Proceedings of the 21st IASTED International Conference, Applied Informatics, Innsbruck, Austria, February 10-13, vol. 37 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

National Research Council Canada, M50, 1200 Montreal Rd., Ottawa, ON, K1A 0R6
Julio J. Valdés & Alan J. Barton

Authors

Julio J. Valdés
View author publications
You can also search for this author in PubMed Google Scholar
Alan J. Barton
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Regina, Regina, SK, S4S 0A2 Canada, Polish-Japanese Institute of Information Technology, Koszykowa 86, 02-008 Warsaw, P.O. Box, Poland
Dominik Ślęzak
Department of Computer Science, University of Regina, S4S 0A2, Regina, Saskatchewan, Canada
JingTao Yao & Wojciech Ziarko &
Department of Electrical and Computer Engineering, University of Manitoba, R3T 5V6, Winnipeg, Manitoba, Canada
James F. Peters
College of Computer and Information Engineering, Hehan University, Henan, China
Xiaohua Hu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Valdés, J.J., Barton, A.J. (2005). Relevant Attribute Discovery in High Dimensional Data Based on Rough Sets and Unsupervised Classification: Application to Leukemia Gene Expressions. In: Ślęzak, D., Yao, J., Peters, J.F., Ziarko, W., Hu, X. (eds) Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing. RSFDGrC 2005. Lecture Notes in Computer Science(), vol 3642. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11548706_38

Download citation

DOI: https://doi.org/10.1007/11548706_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28660-8
Online ISBN: 978-3-540-31824-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics