Learning Interpretable Classification Rules with Boolean Compressed Sensing
- 2 Citations
- 1 Mentions
- 2.2k Downloads
Abstract
An important problem in the context of supervised machine learning is designing systems which are interpretable by humans. In domains such as law, medicine, and finance that deal with human lives, delegating the decision to a black-box machine-learning model carries significant operational risk, and often legal implications, thus requiring interpretable classifiers. Building on ideas from Boolean compressed sensing, we propose a rule-based classifier which explicitly balances accuracy versus interpretability in a principled optimization formulation. We represent the problem of learning conjunctive clauses or disjunctive clauses as an adaptation of a classical problem from statistics, Boolean group testing, and apply a novel linear programming (LP) relaxation to find solutions. We derive theoretical results for recovering sparse rules which parallel the conditions for exact recovery of sparse signals in the compressed sensing literature. This is an exciting development in interpretable learning where most prior work has focused on heuristic solutions. We also consider a more general class of rule-based classifiers, checklists and scorecards, learned using ideas from threshold group testing. We show competitive classification accuracy using the proposed approach on real-world data sets.
Keywords
Compress Sense Linear Programming Relaxation Sparse Signal Linear Programming Formulation Clinical Prediction RuleNotes
Acknowledgements
The authors thank Vijay S. Iyengar, Benjamin Letham, Cynthia Rudin, Viswanath Nagarajan, Karthikeyan Natesan Ramamurthy, Mikhail Malyutov and Venkatesh Saligrama for valuable discussions.
References
- 1.Adams, S.T., Leveson, S.H.: Clinical prediction rules. Br. Med. J. 344, d8312 (2012)CrossRefGoogle Scholar
- 2.Atia, G.K., Saligrama, V.: Boolean compressed sensing and noisy group testing. IEEE Trans. Inf. Theory 58 (3), 1880–1901 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
- 3.Bertsimas, D., Chang, A., Rudin, C.: An integer optimization approach to associative classification. In: Advances in Neural Information Processing Systems 25, pp. 269–277 (2012)Google Scholar
- 4.Blum, A., Kalai, A., Langford, J.: Beating the hold-out: bounds for k-fold and progressive cross-validation. In: Proceedings of the Conference on Computational Learning Theory, Santa Cruz, CA, pp. 203–208 (1999)Google Scholar
- 5.Boros, E., Hammer, P.L., Ibaraki, T., Kogan, A., Mayoraz, E., Muchnik, I.: An implementation of logical analysis of data. IEEE Trans. Knowl. Data Eng. 12 (2), 292–306 (2000)CrossRefGoogle Scholar
- 6.Candès, E.J., Wakin, M.B.: An introduction to compressive sampling. IEEE Signal Process. Mag. 25 (2), 21–30 (2008)CrossRefGoogle Scholar
- 7.Chen, H.B., Fu, H.L.: Nonadaptive algorithms for threshold group testing. Discret. Appl. Math. 157, 1581–1585 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
- 8.Cheraghchi, M., Hormati, A., Karbasi, A., Vetterli, M.: Compressed sensing with probabilistic measurements: a group testing solution. In: Proceedings of the Annual Allerton Conference on Communication Control and Computing, Allerton, IL, pp. 30–35 (2009)Google Scholar
- 9.Clark, P., Niblett, T.: The CN2 induction algorithm. Mach. Learn. 3 (4), 261–283 (1989)Google Scholar
- 10.Cohen, W.W.: Fast effective rule induction. In: Proceedings of the International Conference on Machine Learning, Tahoe City, CA, pp. 115–123 (1995)Google Scholar
- 11.Dai, L., Pelckmans, K.: An ellipsoid based, two-stage screening test for BPDN. In: Proceedings of the European Signal Processing Conference, Bucharest, Romania, pp. 654–658 (2012)Google Scholar
- 12.Dash, S., Malioutov, D.M., Varshney, K.R.: Screening for learning classification rules via Boolean compressed sensing. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Florence, Italy, pp. 3360–3364 (2014)Google Scholar
- 13.Dash, S., Malioutov, D.M., Varshney, K.R.: Learning interpretable classification rules using sequential row sampling. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Brisbane, Australia (2015)Google Scholar
- 14.Dembczyński, K., Kotłowski, W., Słowiński, R.: ENDER: a statistical framework for boosting decision rules. Data Min. Knowl. Disc. 21 (1), 52–90 (2010)MathSciNetCrossRefGoogle Scholar
- 15.Donoho, D.L., Elad, M.: Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization. Proc. Natl. Acad. Sci. 100 (5), 2197–2202 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
- 16.Du, D.Z., Hwang, F.K.: Pooling Designs and Nonadaptive Group Testing: Important Tools for DNA Sequencing. World Scientific, Singapore (2006)CrossRefzbMATHGoogle Scholar
- 17.Dyachkov, A.G., Rykov, V.V.: A survey of superimposed code theory. Prob. Control. Inf. 12 (4), 229–242 (1983)MathSciNetGoogle Scholar
- 18.Dyachkov, A.G., Vilenkin, P.A., Macula, A.J., Torney, D.C.: Families of finite sets in which no intersection of l sets is covered by the union of s others. J. Combin. Theory 99, 195–218 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
- 19.Eckstein, J., Goldberg, N.: An improved branch-and-bound method for maximum monomial agreement. INFORMS J. Comput. 24 (2), 328–341 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
- 20.El Ghaoui, L., Viallon, V., Rabbani, T.: Safe feature elimination in sparse supervised learning. Pac. J. Optim. 8 (4), 667–698 (2012)MathSciNetzbMATHGoogle Scholar
- 21.Emad, A., Milenkovic, O.: Semiquantitative group testing. IEEE Trans. Inf. Theory 60 (8), 4614–4636 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
- 22.Frank, A., Asuncion, A.: UCI machine learning repository. http://archive.ics.uci.edu/ml (2010)
- 23.Friedman, J.H., Popescu, B.E.: Predictive learning via rule ensembles. Ann. Appl. Stat. 2 (3), 916–954 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
- 24.Fry, C.: Closing the gap between analytics and action. INFORMS Analytics Mag. 4 (6), 4–5 (2011)Google Scholar
- 25.Gage, B.F., Waterman, A.D., Shannon, W., Boechler, M., Rich, M.W., Radford, M.J.: Validation of clinical classification schemes for predicting stroke. J. Am. Med. Assoc. 258 (22), 2864–2870 (2001)CrossRefGoogle Scholar
- 26.Gawande, A.: The Checklist Manifesto: How To Get Things Right. Metropolitan Books, New York (2009)Google Scholar
- 27.Gilbert, A.C., Iwen, M.A., Strauss, M.J.: Group testing and sparse signal recovery. In: Conference Record - Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, pp. 1059–1063 (2008)Google Scholar
- 28.Jawanpuria, P., Nath, J.S., Ramakrishnan, G.: Efficient rule ensemble learning using hierarchical kernels. In: Proceedings of the International Conference on Machine Learning, Bellevue, WA, pp. 161–168 (2011)Google Scholar
- 29.John, G.H., Langley, P.: Static versus dynamic sampling for data mining. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Portland, OR, pp. 367–370 (1996)Google Scholar
- 30.Kautz, W., Singleton, R.: Nonrandom binary superimposed codes. IEEE Trans. Inf. Theory 10 (4), 363–377 (1964)CrossRefzbMATHGoogle Scholar
- 31.Letham, B., Rudin, C., McCormick, T.H., Madigan, D.: Building interpretable classifiers with rules using Bayesian analysis. Tech. Rep. 609, Department of Statistics, University of Washington (2012)Google Scholar
- 32.Liu, J., Li, M.: Finding cancer biomarkers from mass spectrometry data by decision lists. J. Comput. Biol. 12 (7), 971–979 (2005)CrossRefGoogle Scholar
- 33.Liu, J., Zhao, Z., Wang, J., Ye, J.: Safe screening with variational inequalities and its application to lasso. In: Proceedings of the International Conference on Machine Learning, Beijing, China, pp. 289–297 (2014)Google Scholar
- 34.Malioutov, D., Malyutov, M.: Boolean compressed sensing: LP relaxation for group testing. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Kyoto, Japan, pp. 3305–3308 (2012)Google Scholar
- 35.Malioutov, D.M., Varshney, K.R.: Exact rule learning via Boolean compressed sensing. In: Proceedings of the International Conference on Machine Learning, Atlanta, GA, pp. 765–773 (2013)Google Scholar
- 36.Malioutov, D.M., Sanghavi, S.R., Willsky, A.S.: Sequential compressed sensing. IEEE J. Spec. Top. Signal Proc. 4 (2), 435–444 (2010)CrossRefGoogle Scholar
- 37.Malyutov, M.: The separating property of random matrices. Math. Notes 23 (1), 84–91 (1978)MathSciNetCrossRefzbMATHGoogle Scholar
- 38.Malyutov, M.: Search for sparse active inputs: a review. In: Aydinian, H., Cicalese, F., Deppe, C. (eds.) Information Theory, Combinatorics, and Search Theory: In Memory of Rudolf Ahlswede, pp. 609–647. Springer, Berlin/Germany (2013)CrossRefGoogle Scholar
- 39.Marchand, M., Shawe-Taylor, J.: The set covering machine. J. Mach. Learn. Res. 3, 723–746 (2002)MathSciNetzbMATHGoogle Scholar
- 40.Maron, O., Moore, A.W.: Hoeffding races: accelerating model selection search for classification and function approximation. Adv. Neural Inf. Proces. Syst. 6, 59–66 (1993)Google Scholar
- 41.Mazumdar, A.: On almost disjunct matrices for group testing. In: Proceedings of the International Symposium on Algorithms and Computation, Taipei, Taiwan, pp. 649–658 (2012)Google Scholar
- 42.Provost, F., Jensen, D., Oates, T.: Efficient progressive sampling. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, pp. 23–32 (1999)Google Scholar
- 43.Quinlan, J.R.: Simplifying decision trees. Int. J. Man Mach. Stud. 27 (3), 221–234 (1987)CrossRefGoogle Scholar
- 44.Rivest, R.L.: Learning decision lists. Mach. Learn. 2 (3), 229–246 (1987)Google Scholar
- 45.Rückert, U., Kramer, S.: Margin-based first-order rule learning. Mach. Learn. 70 (2–3), 189–206 (2008)CrossRefGoogle Scholar
- 46.Sejdinovic, D., Johnson, O.: Note on noisy group testing: asymptotic bounds and belief propagation reconstruction. In: Proceedings of the Annual Allerton Conference on Communication Control and Computing, Allerton, IL, pp. 998–1003 (2010)Google Scholar
- 47.Stinson, D.R., Wei, R.: Generalized cover-free families. Discret. Math. 279, 463–477 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
- 48.Ustun, B., Rudin, C.: Methods and models for interpretable linear classification. Available at http://arxiv.org/pdf/1405.4047 (2014)
- 49.Wagstaff, K.L.: Machine learning that matters. In: Proceedings of the International Conference on Machine Learning, Edinburgh, United Kingdom, pp. 529–536 (2012)Google Scholar
- 50.Wang, F., Rudin, C.: Falling rule lists. Available at http://arxiv.org/pdf/1411.5899 (2014)
- 51.Wang, J., Zhou, J., Wonka, P., Ye, J.: Lasso screening rules via dual polytope projection. Adv. Neural Inf. Proces. Syst. 26, 1070–1078 (2013)zbMATHGoogle Scholar
- 52.Wang, Y., Xiang, Z.J., Ramadge, P.J.: Lasso screening with a small regularization parameter. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, Canada, pp. 3342–3346 (2013)Google Scholar
- 53.Wang, Y., Xiang, Z.J., Ramadge, P.J.: Tradeoffs in improved screening of lasso problems. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, Canada, pp. 3297–3301 (2013)Google Scholar
- 54.Wang, T., Rudin, C., Doshi, F., Liu, Y., Klampfl, E., MacNeille, P.: Bayesian or’s of and’s for interpretable classification with application to context aware recommender systems. Available at http://arxiv.org/abs/1504.07614 (2015)
- 55.Wu, H., Ramadge, P.J.: The 2-codeword screening test for lasso problems. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, Canada, pp. 3307–3311 (2013)Google Scholar
- 56.Xiang, Z.J., Ramadge, P.J.: Fast lasso screening tests based on correlations. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Kyoto, Japan, pp. 2137–2140 (2012)Google Scholar
- 57.Xiang, Z.J., Xu, H., Ramadge, P.J.: Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries. Advances in Neural Information Processing Systems, vol. 24, pp. 900–908. MIT Press, Cambridge, MA (2011)Google Scholar