Feature Discovery in Classification Problems

del Valle, Manuel; Sánchez, Beatriz; Lago-Fernández, Luis F.; Corbacho, Fernando J.

doi:10.1007/11552253_44

Manuel del Valle²¹,
Beatriz Sánchez^21,22,
Luis F. Lago-Fernández^21,23 &
…
Fernando J. Corbacho^21,23

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3646))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

1987 Accesses
1 Citations

Abstract

In most problems of Knowledge Discovery the human analyst previously constructs a new set of features, derived from the initial problem input attributes, based on a priori knowledge of the problem structure. These different features are constructed from different transformations which must be selected by the analyst. This paper provides a first step towards a methodology that allows the search for near-optimal representations in classification problems by allowing the automatic selection and composition of feature transformations from an initial set of basis functions. In many cases, the original representation for the problem data is not the most appropriate, and the search for a new representation space that is closer to the structure of the problem to be solved is critical for the successful solution of the problem. On the other hand, once this optimal representation is found, most of the problems may be solved by a linear classification method. As a proof of concept we present two classification problems where the class distributions have a very intricate overlap on the space of original attributes. For these problems, the proposed methodology is able to construct representations based on function compositions from the trigonometric and polynomial bases that provide a solution where some of the classical learning methods, e.g. multilayer perceptrons and decision trees, fail. The methodology consists of a discrete search within the space of compositions of the basis functions and a linear mapping performed by a Fisher discriminant. We play special emphasis on the first part. Finding the optimal composition of basis functions is a difficult problem because of its nongradient nature and the large number of possible combinations. We rely on the global search capabilities of a genetic algorithm to scan the space of function compositions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)
Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, pp. 84–214. John Wiley and Sons, Chichester (2001)
MATH Google Scholar
Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery: an overview. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 1–34. AAAI Press, Menlo Park (1996)
Google Scholar
Flach, P.A., Lavrac, N.: The role of Feature Construction in Inductive Rule Learning. In: ICML, pp. 1–11 (2000)
Google Scholar
Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. Machine Learning, 1157–1182 (2003)
Google Scholar
Kramer, S.: Demand-Driven Construction of Structural Features in ILP. In: Rouveirol, C., Sebag, M. (eds.) ILP 2001. LNCS (LNAI), vol. 2157, pp. 132–141. Springer, Heidelberg (2001)
Chapter Google Scholar
Kohavi, R., John, G.H.: Wrappers for Feature Subset Selection. Artificial Intelligence 97(1–2), 273–324 (1997)
Article MATH Google Scholar
Kudenko, D., Hirsh, H.: Feature Generation for Sequence Categorization. American Association for Artificial Intelligence, 733–738 (1998)
Google Scholar
Lago-Fernández, L.F., Corbacho, F.J.: Optimal Extraction of Hidden Causes. In: Dorronsoro, J.R. (ed.) ICANN 2002. LNCS, vol. 2415, pp. 631–636. Springer, Heidelberg (2002)
Chapter Google Scholar
Levine, D.: Users Guide to the PGAPack Parallel Genetic Algorithm Library. T.R.ANL-95/18 (1996)
Google Scholar
Pagallo, G.: Boolean Feature Discovery in Empirical Learning. Machine Learning 5(1), 71–99 (1990)
Article Google Scholar
Prechelt, L.: Proben1: A set of Neural Network Benchmark Problems and Benchmarking Rules. Tech. Rep. 21/94, Fakultät für Informatik, Univ. Karlsruhe, Karlsruhe, Germany (1994)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1992)
Google Scholar
Ragavan, H., Rendell, M.: Lookahead Feature Construction for Learning Hard Concepts. In: ICML, pp. 252–259 (1993)
Google Scholar
Rennie, J.D.M., Jaakkola, T.: Automatic Feature Induction for Text Classification. MIT Artificial Intelligence Laboratory Abstract Book (2002)
Google Scholar
Sierra, A., Macías, J.A., Corbacho, F.: Evolution of Functional Link Networks. IEEE Trans. Evol. Comp. 5(1), 54–65 (2001)
Article Google Scholar
Utgoff, P.E., Precup, D.: Constructive Function Approximation. In: Liu, H., Motoda, H. (eds.) Feature Extraction, Construction and Selection: a Data Mining Perspective, Boston, pp. 219–235. Kluwer Academic, Dordrecht (1998)
Google Scholar
Vapnik, V.N.: Statistical Learning Theory. John Wiley and Sons, Chichester (1998)
MATH Google Scholar
Zucker, J.D., Ganascia, J.G.: Representation Changes for Efficient Learning in Structural Domains. In: ICML, pp. 543–551 (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Escuela Politécnica Superior, Universidad Autónoma de Madrid, 28049, Madrid, Spain
Manuel del Valle, Beatriz Sánchez, Luis F. Lago-Fernández & Fernando J. Corbacho
Telefónica Investigación y Desarrollo, C/ Emilio Vargas 6, 28043, Madrid, Spain
Beatriz Sánchez
Cognodata Consulting, C/ Caracas 23, 28010, Madrid, Spain
Luis F. Lago-Fernández & Fernando J. Corbacho

Authors

Manuel del Valle
View author publications
You can also search for this author in PubMed Google Scholar
Beatriz Sánchez
View author publications
You can also search for this author in PubMed Google Scholar
Luis F. Lago-Fernández
View author publications
You can also search for this author in PubMed Google Scholar
Fernando J. Corbacho
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Information Technology, National Research Council Canada, Ottawa, Canada
A. Fazel Famili
LIACS, Leiden University, The Netherlands
Joost N. Kok
IFM, Linköping University, SE-58183, Linköping, Sweden
José M. Peña
Department of Computer Science, Universiteit Utrecht,
Arno Siebes
Utrecht University, TB Utrecht,, P.O. box 80 089, NL-3508, the Netherlands
Ad Feelders

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

del Valle, M., Sánchez, B., Lago-Fernández, L.F., Corbacho, F.J. (2005). Feature Discovery in Classification Problems. In: Famili, A.F., Kok, J.N., Peña, J.M., Siebes, A., Feelders, A. (eds) Advances in Intelligent Data Analysis VI. IDA 2005. Lecture Notes in Computer Science, vol 3646. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11552253_44

Download citation

DOI: https://doi.org/10.1007/11552253_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28795-7
Online ISBN: 978-3-540-31926-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics