Abstract
Easily comprehensible ways of capturing main differences between two classes of data are investigated in this paper. In addition to examining individual di.erences, we also consider their neighbourhood. The new concepts are applied to three gene expression datasets to discover diagnostic gene groups. Based on the idea of prediction by collective likelihoods (PCL), a new method is proposed to classify testing samples. Its performance is competitive to several state-of-the-art algorithms.
Chapter PDF
Similar content being viewed by others
Keywords
- Acute Lymphoblastic Leukemia
- Convex Space
- Gene Expression Dataset
- Acute Myeloblastic Leukemia
- Counterpart Class
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Alon, U. and et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of National Academy of Sciences of the United States of American, 96:674–675, 1999.
Burges, C. J. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:121–167, 1998.
Guozhu Dong and Jinyan Li. Efficient mining of emerging patterns: Discovering trends and differences. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 43–52, San Diego, CA, 1999. ACM Press.
Guozhu Dong, Xiuzhen Zhang, Limsoon Wong, and Jinyan Li. CAEP: Classification by aggregating emerging patterns. In Proceedings of the Second International Conference on Discovery Science, Tokyo, Japan, pages 30–42. Springer-Verlag, December 1999.
Fayyad, U. M. and Irani, K. B. Multi-interval discretization of continuous-valued attributes for classi.cation learning. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, pages 1022–1029. Morgan Kaufmann, 1993.
Golub, T. R. and et al. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286:531–537, October 1999.
Carl A. Gunter, Teow-Hin Ngair, and Devika Subramanian. The common ordertheoretic structure of version spaces and ATMS’s. Artificial Intelligence, 95:357–407, 1997.
Hirsh, H. Generalizing version spaces. Machine Learning, 17:5–46, 1994.
Kohavi, R. and et al. MLC++: A machine learning library in C++. In Tools with artificial intelligence, pages 740–743, 1994.
Langley, P., Iba, W. and Thompson, K. An analysis of Bayesian classifier. In Proceedings of the Tenth National Conference on Artificial Intelligence, pages 223–228. AAAI Press, 1992.
Jinyan Li, Guozhu Dong, and Kotagiri Ramamohanarao. Making use of the most expressive jumping emerging patterns for classification. Knowledge and Information Systems: An International Journal, 3:131–145, 2001.
Jinyan Li, Kotagiri Ramamohanarao, and Guozhu Dong. The space of jumping emerging patterns and its incremental maintenance algorithms. In Proceedings of the Seventeenth International Conference on Machine Learning, Stanford, CA, USA, pages 551–558, San Francisco, June 2000. Morgan Kaufmann.
Lockhart, T. J. and et al. Expression monitoring by hybridization to high-density oligonucleotide arrays. Nature Biotechnology, 14:1675–1680, 1996.
Mitchell, T. M. Generalization as search. Artificial Intelligence, 18:203–226, 1982.
Quinlan, J. R. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA, 1993.
Witten, H. and Frank, E. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementation. Morgan Kaufmann, San Mateo, CA, 2000.
Eng-Juh Yeoh and et. al. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell, 1:133–143, 2002.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, J., Wong, L. (2002). Geography of Di.erences between Two Classes of Data. In: Elomaa, T., Mannila, H., Toivonen, H. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 2002. Lecture Notes in Computer Science, vol 2431. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45681-3_27
Download citation
DOI: https://doi.org/10.1007/3-540-45681-3_27
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44037-6
Online ISBN: 978-3-540-45681-0
eBook Packages: Springer Book Archive