LEFT–Logical Expressions Feature Transformation: A Framework for Transformation of Symbolic Features
The accuracy of a classifier relies heavily on the encoding and representation of input data. Many machine learning algorithms require that the input vectors be composed of numeric values on which arithmetic and comparison operators be applied. However, many real life applications involve the collection of data, which is symbolic or ‘nominal type’ data, on which these operators are not available. This paper presents a framework called logical expression feature transformation (LEFT), which can be used for mapping symbolic attributes to a continuous domain, for further processing by a learning machine. It is a generic method that can be used with any suitable clustering method and any appropriate distance metric. The proposed method was tested on synthetic and real life datasets. The results show that this framework not only achieves dimensionality reduction but also improves the accuracy of a classifier.
KeywordsFeature Vector Symbolic Data Logical Expression Binary Encode Breast Cancer Dataset
Unable to display preview. Download preview PDF.
- 1.Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley and Sons (2000)Google Scholar
- 3.Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Machine Learning 6, 37–66 (1991)Google Scholar
- 7.Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley and Sons (1990)Google Scholar
- 9.Guyon, I., Saffari, A., Dror, G., Cawley, G.: Agnostic learning vs. prior knowledge challenge. In: Proceedings of International Joint Conference on Neural Networks (August 2007)Google Scholar
- 10.Saffari, A., Guyon, I.: Quick start guide for CLOP (May 2006), http://ymer.org/research/files/clop/QuickStartV1.0.pdf
- 11.Asuncion, A., Newman, D.: UCI machine learning repository (2007)Google Scholar
- 12.Knopf, A.A.: Mushroom records drawn from The Audubon Society Field Guide to North American Mushrooms. G. H. Lincoff (Pres.), New York (1981)Google Scholar
- 13.Kohavi, R.: Scaling up the accuracy of naive-bayes classifiers: a decision-tree hybrid. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (1996)Google Scholar
- 14.Zwitter, M., Soklic, M.: Breast cancer data. Institute of Oncology, University Medical Center, Ljubljana, Yugoslavia (1988); Donors: Tan, M., Schlimmer, J.,Google Scholar
- 15.Aha, D.W.: Incremental constructive induction: An instance-based approach. In: Proceedings of the Eighth International Workshop on Machine Learning (1991)Google Scholar