Abstract
In this work we propose a method for computing a minimum size training set consistent subset for the Nearest Neighbor rule (also said CNN problem) via SAT encodings. We introduce the SAT–CNN algorithm, which exploits a suitable encoding of the CNN problem in a sequence of SAT problems in order to exactly solve it, provided that enough computational resources are available. Comparison of SAT–CNN with well-known greedy methods shows that SAT–CNN is able to return a better solution. The proposed approach can be extended to several hard subset selection classification problems.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Angiulli, F. (2005). Fast condensed nearest neighbor rule. In 22nd International Conference on Machine Learning (ICML), Bonn, Germany.
Angiulli, F. (2007). Condensed nearest neighbor data domain description. IEEE Trans. Pattern Anal. Mach. Intell., 29(10):1746-1758.
Angiulli, F. (2007). Fast nearest neighbor condensation for large data sets classification. IEEE Trans. Knowl. Data Eng., 19(11):1450-1464.
Many à , F., & Ans ótegui, C (2004). Mapping problems with finite-domain variables into problems with boolean variables. In Proc. of the Seventh Int. Conf. on Theory and Applications of Satisifiability Testing (SAT), pages 111-119, Vancouver, BC, Canada.
Cook, S.A. (1971). The complexity of theorem-proving procedures. In 3rd ACM Symposium on Theory of Computing, pages 151-158, Ohio, United States.
Hart P.E., & Cover, T.M. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1):21-27.
Dasarathy, B. (1994). Minimal consistent subset (mcs) identification for optimal nearest neighbor decision systems design. IEEE Transactions on Systems, Man, and Cybernetics, 24(3):511-517.
Logemann, G., Loveland, D., & Davis, M. (1962). A machine program for theorem-proving. Communications of the ACM, 5(7):394-397.
Murty, M.N., & Devi, F.S. (2002). An incremental prototype set building technique. Pattern Recognition, 35(2):505-513.
Devroye, L. (1981). On the inequality of cover and hart in nearest neighbor discrimination. IEEE Transactions on Pattern Analysis and Machine Intelligence, 3:75-78.
S örensson, N., & E én, N. (2005). Minisat a sat solver with conflict-clause minimization. In International Conference on Theory and Applications of Satisfiability Testing.
Warmuth, M., & Floyd, S. (1995). Sample compression, learnability, and the vapnikchervonenkis dimension. Machine Learning, 21(3):269-304.
Hostetler, L.D., & Fukunaga, K. (1975). k-nearest-neighbor bayes-risk estimation. IEEE Trans. on Information Theory, 21:285-293.
Johnson, D.S., & Garey, M.R. (1979). Computers and Intractability. A Guide to the Theory of NP-completeness. Freeman and Comp., NY, USA.
Gates, W. (1972). The reduced nearest neighbor rule. IEEE Transactions on Information Theory, 18(3):431-433.
Prosser, P., & Gent, I.P. (2002). In Proc. of the Fifth Int. Conf. on Theory and Applications of Satisifiability Testing (SAT), Cincinnati, Ohio, USA.
Gonzalez, T. (1985). Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38:293-306.
Hart, P.E. (1968). The condensed nearest neighbor rule. IEEE Transactions on Information Theory, 14(3):515-516.
Krim, H., & Karaçali, B. (2003). Fast minimization of structural risk by nearest neighbor rule. IEEE Transactions on Neural Networks, 14(1):127-134.
Nakagawa, M., & Liu, C.L. (2001). Evaluation of prototype learning algorithms for nearest-neighbor classifier in application to handwritten character recognition. Pattern Recognition, 34(3):601-615.
Madigan, C., Zhao, Y., Zhang, L., Malik, S., & Moskewicz, M. (2001). Engineering an efficient sat solver. In 39th Design Automation Conference (DAC).
Darwiche, A., & Pipatsrisawat, K. (2007). Rsat 2.0: Sat solver description. Technical Report D-153, Automated Reasoning Group, Computer Science Department, UCLA.
Woodruff, H.B., Lowry, S.R., Isenhour, T.L., & Ritter, G.L. (1975). An algorithm for a selective nearest neighbor decision rule. IEEE Transactions on Information Theory, 21:665-669.
Stone, C. (1977). Consistent nonparametric regression. Annals of Statistics, 8:1348-1360.
Toussaint, G. (2002). Proximity graphs for nearest neighbor decision rules: Recent progress. In Proceedings of the Symposium on Computing and Statistics, Montreal, Canada, April 17-20.
Vapnik, V. (1995). The Nature of the statistical learning theory. Springer Verlag, New York.
Wilfong, G. (1992). Nearest neighbor problems. International Journal of Computational Geometry & Applications, 2(4):383-416.
Martinez, T.R., & Wilson, D.R. (2000). Reduction techniques for instance-based learning algorithms. Machine Learning, 38(3):257-286.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 International Federation for Information Processing
About this paper
Cite this paper
Angiulli, F., Basta, S. (2008). Optimal Subset Selection for Classification through SAT Encodings. In: Bramer, M. (eds) Artificial Intelligence in Theory and Practice II. IFIP AI 2008. IFIP – The International Federation for Information Processing, vol 276. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09695-7_30
Download citation
DOI: https://doi.org/10.1007/978-0-387-09695-7_30
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-09694-0
Online ISBN: 978-0-387-09695-7
eBook Packages: Computer ScienceComputer Science (R0)