Abstract
As pointed out by Blum [Blu94], “nearly all results in Machine Learning [...] deal with problems of separating relevant from irrelevant information in some way”. This paper is concerned with structural complexity issues regarding the selection of relevant Prototypes or Features. We give the first results proving that both problems can be much harder than expected in the literature for various notions of relevance. In particular, the worst-case bounds achievable by any efficient algorithm are proven to be very large, most of the time not so far from trivial bounds. We think these results give a theoretical justification for the numerous heuristic approaches found in the literature to cope with these problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
G. Ausiello, P. Crescenzi, G. Gambosi, V. Kann, Marchetti Spaccamela A., and Protasi M. Complexity and Approximation. Combinatorial Optimization Problems and their Approximability Properties. Springer-Verlag, Berlin, 1999. 226
S. Arora. Probabilistic checking of proofs and hardness of approximation problems. Technical Report CS-TR-476-94, Princeton University, 1994. 225, 228
M. Bellare. Proof checking and Approximation: towards tight results. SIGACT news, 1996. 225, 228
L. Breiman, J. H. Freidman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth, 1984. 227
A. Blum and P. Langley. Selection of relevant features and examples in machine learning. Artificial Intelligence, pages 245–272, 1997. 225, 227, 235
A. Blum. Relevant examples and relevant features: Thoughts from computational learning theory. In AAAI Fall Symposium (survey paper), 1994. 224
P. Crescenzi and V. Kann. A Compendium of NP-Optimization problems. WWW-Available at http://www.nada.kth.se/~viggo/wwwcompendium/, 2000. 226, 230
T. Hancock, T. Jiang, M. Li, and J. Tromp. Lower bounds on learning decision lists and trees. In Proc. of the Symposium on Theoretical Aspects of Computer Science, 1994. 225, 231
L. Hyafil and R. Rivest. Constructing optimal decision trees is npcomplete. Inform. Process. Letters, pages 15–17, 1976. 231
George H. John, Ron Kohavi, and Karl Pfleger. Irrelevant features and the subset selection problem. In Proc. of the 11 th International Conference on Machine Learning, pages 121–129, 1994. 233
D. S. Johnson. Approximation algorithms for combinatorial problems. Journal of Computer and System Sci., pages 256–278, 1974. 226, 235
V. Kann, S. Khanna, J. Lagergren, and A. Panconesi. On the hardness of approximating MAX k-CUT and its dual. Chicago Journal of Theoretical Computer Science, 2, 1997. 225
M.J. Kearns and Y. Mansour. On the boosting ability of top-down decision tree learning algorithms. Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing, pages 459–468, 1996. 227
R. Kohavi. Feature subset selection as search with probabilistic estimates. In AAAI Fall Symposium on Relevance, 1994. 224
R. Kohavi and D. Sommerfield. Feature subset selection using the wrapper model: overfitting and dynamic search space topology. In First International Conference on Knowledge Discovery and Data Mining, 1995. 224
D. Koller and R. M. Sahami. Toward optimal feature selection. In Proc. of the 13 th International Conference on Machine Learning, 1996. 224
M. J. Kearns and U. V. Vazirani.An Introduction to Computational Learning Theory. M.I.T. Press, 1994. 231, 235
T. Mitchell. Machine Learning. McGraw-Hill, 1997. 227
R. Nock and O. Gascuel. On learning decision committees. In Proc. of the 12 th International Conference on Machine Learning, pages 413–420, 1995. 231
R. Nock and P. Jappy. Function-free horn clauses are hard to approximate. In Proc. of the Eighth International Conference on Inductive Logic Programming, pages 195–204, 1998. 225
R. Nock and P. Jappy. On the power of decision lists. In Proc. of the 15 th International Conference on Machine Learning, pages 413–420, 1998. 231
R. Nock, P. Jappy, and J. Sallantin. Generalized Graph Colorability and Compressibility of Boolean Formulae. In Proc. of the 9 th International Symp. on Algorithms and Computation, pages 237–246, 1998. 225, 226, 231
R. Nock. Learning logical formulae having limited size: theoretical aspects, methods and results. PhD thesis, Université Montpellier II, 1998. Also available as techreport RR-LIRMM-98014. 231
K. Pillaipakkamnatt and V. Raghavan. On the limits of proper learnability of subclasses of DNF formulae. In Proc. of the 7 th International Conference on Computational Learning Theory, pages 118–129, 1994. 232
J. R. Quinlan. C4.5: programs for machine learning. Morgan Kaufmann, 1994. 227
D. B. Skalak. Prototype and feature selection by sampling and random mutation hill-climbing algorithms. In Eleventh International Conference on Machine Learning, pages 293–301, 1994. 224
M. Sebban and R. Nock. Combining feature and prototype pruning by uncertainty minimization. In Proc. of the 16 th International Conference on Uncertainty in Artificial Intelligence, 2000. to appear. 224
M. Sebban and R. Nock. Prototype selection as an information-preserving problem. In Proc. of the 17 th International Conference on Machine Learning, 2000. to appear. 224
R. E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. In Proceedings of the Eleventh Annual ACM Conference on Computational Learning Theory, pages 80–91, 1998. 227
D. Wilson and T. Martinez. Instance pruning techniques. In Proc. of the 14 th International Conference on Machine Learning, pages 404–411, 1997. 224
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nock, R., Sebban, M. (2000). Sharper Bounds for the Hardness of Prototype and Feature Selection. In: Arimura, H., Jain, S., Sharma, A. (eds) Algorithmic Learning Theory. ALT 2000. Lecture Notes in Computer Science(), vol 1968. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-40992-0_17
Download citation
DOI: https://doi.org/10.1007/3-540-40992-0_17
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41237-3
Online ISBN: 978-3-540-40992-2
eBook Packages: Springer Book Archive