Advertisement

Learning Interpretable Classification Rules with Boolean Compressed Sensing

  • Dmitry M. MalioutovEmail author
  • Kush R. Varshney
  • Amin Emad
  • Sanjeeb Dash
Chapter
Part of the Studies in Big Data book series (SBD, volume 32)

Abstract

An important problem in the context of supervised machine learning is designing systems which are interpretable by humans. In domains such as law, medicine, and finance that deal with human lives, delegating the decision to a black-box machine-learning model carries significant operational risk, and often legal implications, thus requiring interpretable classifiers. Building on ideas from Boolean compressed sensing, we propose a rule-based classifier which explicitly balances accuracy versus interpretability in a principled optimization formulation. We represent the problem of learning conjunctive clauses or disjunctive clauses as an adaptation of a classical problem from statistics, Boolean group testing, and apply a novel linear programming (LP) relaxation to find solutions. We derive theoretical results for recovering sparse rules which parallel the conditions for exact recovery of sparse signals in the compressed sensing literature. This is an exciting development in interpretable learning where most prior work has focused on heuristic solutions. We also consider a more general class of rule-based classifiers, checklists and scorecards, learned using ideas from threshold group testing. We show competitive classification accuracy using the proposed approach on real-world data sets.

Keywords

Compress Sense Linear Programming Relaxation Sparse Signal Linear Programming Formulation Clinical Prediction Rule 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgements

The authors thank Vijay S. Iyengar, Benjamin Letham, Cynthia Rudin, Viswanath Nagarajan, Karthikeyan Natesan Ramamurthy, Mikhail Malyutov and Venkatesh Saligrama for valuable discussions.

References

  1. 1.
    Adams, S.T., Leveson, S.H.: Clinical prediction rules. Br. Med. J. 344, d8312 (2012)CrossRefGoogle Scholar
  2. 2.
    Atia, G.K., Saligrama, V.: Boolean compressed sensing and noisy group testing. IEEE Trans. Inf. Theory 58 (3), 1880–1901 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Bertsimas, D., Chang, A., Rudin, C.: An integer optimization approach to associative classification. In: Advances in Neural Information Processing Systems 25, pp. 269–277 (2012)Google Scholar
  4. 4.
    Blum, A., Kalai, A., Langford, J.: Beating the hold-out: bounds for k-fold and progressive cross-validation. In: Proceedings of the Conference on Computational Learning Theory, Santa Cruz, CA, pp. 203–208 (1999)Google Scholar
  5. 5.
    Boros, E., Hammer, P.L., Ibaraki, T., Kogan, A., Mayoraz, E., Muchnik, I.: An implementation of logical analysis of data. IEEE Trans. Knowl. Data Eng. 12 (2), 292–306 (2000)CrossRefGoogle Scholar
  6. 6.
    Candès, E.J., Wakin, M.B.: An introduction to compressive sampling. IEEE Signal Process. Mag. 25 (2), 21–30 (2008)CrossRefGoogle Scholar
  7. 7.
    Chen, H.B., Fu, H.L.: Nonadaptive algorithms for threshold group testing. Discret. Appl. Math. 157, 1581–1585 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Cheraghchi, M., Hormati, A., Karbasi, A., Vetterli, M.: Compressed sensing with probabilistic measurements: a group testing solution. In: Proceedings of the Annual Allerton Conference on Communication Control and Computing, Allerton, IL, pp. 30–35 (2009)Google Scholar
  9. 9.
    Clark, P., Niblett, T.: The CN2 induction algorithm. Mach. Learn. 3 (4), 261–283 (1989)Google Scholar
  10. 10.
    Cohen, W.W.: Fast effective rule induction. In: Proceedings of the International Conference on Machine Learning, Tahoe City, CA, pp. 115–123 (1995)Google Scholar
  11. 11.
    Dai, L., Pelckmans, K.: An ellipsoid based, two-stage screening test for BPDN. In: Proceedings of the European Signal Processing Conference, Bucharest, Romania, pp. 654–658 (2012)Google Scholar
  12. 12.
    Dash, S., Malioutov, D.M., Varshney, K.R.: Screening for learning classification rules via Boolean compressed sensing. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Florence, Italy, pp. 3360–3364 (2014)Google Scholar
  13. 13.
    Dash, S., Malioutov, D.M., Varshney, K.R.: Learning interpretable classification rules using sequential row sampling. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Brisbane, Australia (2015)Google Scholar
  14. 14.
    Dembczyński, K., Kotłowski, W., Słowiński, R.: ENDER: a statistical framework for boosting decision rules. Data Min. Knowl. Disc. 21 (1), 52–90 (2010)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Donoho, D.L., Elad, M.: Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization. Proc. Natl. Acad. Sci. 100 (5), 2197–2202 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Du, D.Z., Hwang, F.K.: Pooling Designs and Nonadaptive Group Testing: Important Tools for DNA Sequencing. World Scientific, Singapore (2006)CrossRefzbMATHGoogle Scholar
  17. 17.
    Dyachkov, A.G., Rykov, V.V.: A survey of superimposed code theory. Prob. Control. Inf. 12 (4), 229–242 (1983)MathSciNetGoogle Scholar
  18. 18.
    Dyachkov, A.G., Vilenkin, P.A., Macula, A.J., Torney, D.C.: Families of finite sets in which no intersection of l sets is covered by the union of s others. J. Combin. Theory 99, 195–218 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Eckstein, J., Goldberg, N.: An improved branch-and-bound method for maximum monomial agreement. INFORMS J. Comput. 24 (2), 328–341 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    El Ghaoui, L., Viallon, V., Rabbani, T.: Safe feature elimination in sparse supervised learning. Pac. J. Optim. 8 (4), 667–698 (2012)MathSciNetzbMATHGoogle Scholar
  21. 21.
    Emad, A., Milenkovic, O.: Semiquantitative group testing. IEEE Trans. Inf. Theory 60 (8), 4614–4636 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Frank, A., Asuncion, A.: UCI machine learning repository. http://archive.ics.uci.edu/ml (2010)
  23. 23.
    Friedman, J.H., Popescu, B.E.: Predictive learning via rule ensembles. Ann. Appl. Stat. 2 (3), 916–954 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Fry, C.: Closing the gap between analytics and action. INFORMS Analytics Mag. 4 (6), 4–5 (2011)Google Scholar
  25. 25.
    Gage, B.F., Waterman, A.D., Shannon, W., Boechler, M., Rich, M.W., Radford, M.J.: Validation of clinical classification schemes for predicting stroke. J. Am. Med. Assoc. 258 (22), 2864–2870 (2001)CrossRefGoogle Scholar
  26. 26.
    Gawande, A.: The Checklist Manifesto: How To Get Things Right. Metropolitan Books, New York (2009)Google Scholar
  27. 27.
    Gilbert, A.C., Iwen, M.A., Strauss, M.J.: Group testing and sparse signal recovery. In: Conference Record - Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, pp. 1059–1063 (2008)Google Scholar
  28. 28.
    Jawanpuria, P., Nath, J.S., Ramakrishnan, G.: Efficient rule ensemble learning using hierarchical kernels. In: Proceedings of the International Conference on Machine Learning, Bellevue, WA, pp. 161–168 (2011)Google Scholar
  29. 29.
    John, G.H., Langley, P.: Static versus dynamic sampling for data mining. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Portland, OR, pp. 367–370 (1996)Google Scholar
  30. 30.
    Kautz, W., Singleton, R.: Nonrandom binary superimposed codes. IEEE Trans. Inf. Theory 10 (4), 363–377 (1964)CrossRefzbMATHGoogle Scholar
  31. 31.
    Letham, B., Rudin, C., McCormick, T.H., Madigan, D.: Building interpretable classifiers with rules using Bayesian analysis. Tech. Rep. 609, Department of Statistics, University of Washington (2012)Google Scholar
  32. 32.
    Liu, J., Li, M.: Finding cancer biomarkers from mass spectrometry data by decision lists. J. Comput. Biol. 12 (7), 971–979 (2005)CrossRefGoogle Scholar
  33. 33.
    Liu, J., Zhao, Z., Wang, J., Ye, J.: Safe screening with variational inequalities and its application to lasso. In: Proceedings of the International Conference on Machine Learning, Beijing, China, pp. 289–297 (2014)Google Scholar
  34. 34.
    Malioutov, D., Malyutov, M.: Boolean compressed sensing: LP relaxation for group testing. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Kyoto, Japan, pp. 3305–3308 (2012)Google Scholar
  35. 35.
    Malioutov, D.M., Varshney, K.R.: Exact rule learning via Boolean compressed sensing. In: Proceedings of the International Conference on Machine Learning, Atlanta, GA, pp. 765–773 (2013)Google Scholar
  36. 36.
    Malioutov, D.M., Sanghavi, S.R., Willsky, A.S.: Sequential compressed sensing. IEEE J. Spec. Top. Signal Proc. 4 (2), 435–444 (2010)CrossRefGoogle Scholar
  37. 37.
    Malyutov, M.: The separating property of random matrices. Math. Notes 23 (1), 84–91 (1978)MathSciNetCrossRefzbMATHGoogle Scholar
  38. 38.
    Malyutov, M.: Search for sparse active inputs: a review. In: Aydinian, H., Cicalese, F., Deppe, C. (eds.) Information Theory, Combinatorics, and Search Theory: In Memory of Rudolf Ahlswede, pp. 609–647. Springer, Berlin/Germany (2013)CrossRefGoogle Scholar
  39. 39.
    Marchand, M., Shawe-Taylor, J.: The set covering machine. J. Mach. Learn. Res. 3, 723–746 (2002)MathSciNetzbMATHGoogle Scholar
  40. 40.
    Maron, O., Moore, A.W.: Hoeffding races: accelerating model selection search for classification and function approximation. Adv. Neural Inf. Proces. Syst. 6, 59–66 (1993)Google Scholar
  41. 41.
    Mazumdar, A.: On almost disjunct matrices for group testing. In: Proceedings of the International Symposium on Algorithms and Computation, Taipei, Taiwan, pp. 649–658 (2012)Google Scholar
  42. 42.
    Provost, F., Jensen, D., Oates, T.: Efficient progressive sampling. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, pp. 23–32 (1999)Google Scholar
  43. 43.
    Quinlan, J.R.: Simplifying decision trees. Int. J. Man Mach. Stud. 27 (3), 221–234 (1987)CrossRefGoogle Scholar
  44. 44.
    Rivest, R.L.: Learning decision lists. Mach. Learn. 2 (3), 229–246 (1987)Google Scholar
  45. 45.
    Rückert, U., Kramer, S.: Margin-based first-order rule learning. Mach. Learn. 70 (2–3), 189–206 (2008)CrossRefGoogle Scholar
  46. 46.
    Sejdinovic, D., Johnson, O.: Note on noisy group testing: asymptotic bounds and belief propagation reconstruction. In: Proceedings of the Annual Allerton Conference on Communication Control and Computing, Allerton, IL, pp. 998–1003 (2010)Google Scholar
  47. 47.
    Stinson, D.R., Wei, R.: Generalized cover-free families. Discret. Math. 279, 463–477 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  48. 48.
    Ustun, B., Rudin, C.: Methods and models for interpretable linear classification. Available at http://arxiv.org/pdf/1405.4047 (2014)
  49. 49.
    Wagstaff, K.L.: Machine learning that matters. In: Proceedings of the International Conference on Machine Learning, Edinburgh, United Kingdom, pp. 529–536 (2012)Google Scholar
  50. 50.
    Wang, F., Rudin, C.: Falling rule lists. Available at http://arxiv.org/pdf/1411.5899 (2014)
  51. 51.
    Wang, J., Zhou, J., Wonka, P., Ye, J.: Lasso screening rules via dual polytope projection. Adv. Neural Inf. Proces. Syst. 26, 1070–1078 (2013)zbMATHGoogle Scholar
  52. 52.
    Wang, Y., Xiang, Z.J., Ramadge, P.J.: Lasso screening with a small regularization parameter. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, Canada, pp. 3342–3346 (2013)Google Scholar
  53. 53.
    Wang, Y., Xiang, Z.J., Ramadge, P.J.: Tradeoffs in improved screening of lasso problems. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, Canada, pp. 3297–3301 (2013)Google Scholar
  54. 54.
    Wang, T., Rudin, C., Doshi, F., Liu, Y., Klampfl, E., MacNeille, P.: Bayesian or’s of and’s for interpretable classification with application to context aware recommender systems. Available at http://arxiv.org/abs/1504.07614 (2015)
  55. 55.
    Wu, H., Ramadge, P.J.: The 2-codeword screening test for lasso problems. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, Canada, pp. 3307–3311 (2013)Google Scholar
  56. 56.
    Xiang, Z.J., Ramadge, P.J.: Fast lasso screening tests based on correlations. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Kyoto, Japan, pp. 2137–2140 (2012)Google Scholar
  57. 57.
    Xiang, Z.J., Xu, H., Ramadge, P.J.: Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries. Advances in Neural Information Processing Systems, vol. 24, pp. 900–908. MIT Press, Cambridge, MA (2011)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Dmitry M. Malioutov
    • 1
    Email author
  • Kush R. Varshney
    • 1
  • Amin Emad
    • 2
    • 3
  • Sanjeeb Dash
    • 1
  1. 1.IBM T. J. Watson Research CenterYorktown HeightsUSA
  2. 2.Institute for Genomic Biology, University of Illinois, Urbana ChampaignUrbanaUSA
  3. 3.1218 Thomas M. Siebel Center for Computer Science, University of IllinoisUrbanaUSA

Personalised recommendations