Advertisement

Journal of Computer Science and Technology

, Volume 34, Issue 2, pp 494–506 | Cite as

Lossless Compression of Random Forests

  • Amichai PainskyEmail author
  • Saharon Rosset
Regular Paper
  • 10 Downloads

Abstract

Ensemble methods are among the state-of-the-art predictive modeling approaches. Applied to modern big data, these methods often require a large number of sub-learners, where the complexity of each learner typically grows with the size of the dataset. This phenomenon results in an increasing demand for storage space, which may be very costly. This problem mostly manifests in a subscriber-based environment, where a user-specific ensemble needs to be stored on a personal device with strict storage limitations (such as a cellular device). In this work we introduce a novel method for lossless compression of tree-based ensemble methods, focusing on random forests. Our suggested method is based on probabilistic modeling of the ensemble’s trees, followed by model clustering via Bregman divergence. This allows us to find a minimal set of models that provides an accurate description of the trees, and at the same time is small enough to store and maintain. Our compression scheme demonstrates high compression rates on a variety of modern datasets. Importantly, our scheme enables predictions from the compressed format and a perfect reconstruction of the original ensemble. In addition, we introduce a theoretically sound lossy compression scheme, which allows us to control the trade-off between the distortion and the coding rate.

Keywords

entropy coding lossless compression lossy compression random forest 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

11390_2019_1921_MOESM1_ESM.pdf (838 kb)
ESM 1 (PDF 838 kb)

References

  1. [1]
    Breiman L, Friedman J, Olshen R A, Stone C J. Classification and Regression Trees (1st edition). Chapman and Hall/CRC, 1984.Google Scholar
  2. [2]
    Quinlan J R. C4.5: Programs for Machine Learning (1st edition). Morgan Kaufmann Publishers, 1992.Google Scholar
  3. [3]
    Breiman L. Bagging predictors. Machine Learning, 1996, 24(2): 123-140.zbMATHGoogle Scholar
  4. [4]
    Schapire R E. The boosting approach to machine learning: An overview. In Nonlinear Estimation and Classification, Denison D D, Hansen M H, Holmes C C, Mallick B, Yu B (eds.), Springer, 2003, pp.149-171.Google Scholar
  5. [5]
    Breiman L. Random forests. Machine Learning, 2001, 45(1): 5-32.CrossRefzbMATHGoogle Scholar
  6. [6]
    Friedman J, Hastie T, Tibshirani R. The Elements of Statistical Learning: Data Mining, Inference, and Prediction (1st edition). Springer, 2001.Google Scholar
  7. [7]
    Painsky A, Rosset S. Compressing random forests. In Proc. the 16th International Conference on Data Mining, December 2016, pp.1131-1136.Google Scholar
  8. [8]
    Geurts P. Some enhancements of decision tree bagging. In Proc. the 4th European Conference Principles of Data Mining and Knowledge Discovery, Sept. 2000, pp.136-147.Google Scholar
  9. [9]
    Meinshausen N. Node harvest. The Annals of Applied Statistics, 2010, 4(4): 2049-2072.MathSciNetCrossRefzbMATHGoogle Scholar
  10. [10]
    Friedman J H, Popescu B E. Predictive learning via rule ensembles. The Annals of Applied Statistics, 2008, 2(3): 916-954.MathSciNetCrossRefzbMATHGoogle Scholar
  11. [11]
    Bernard S, Heutte L, Adam S. On the selection of decision trees in random forests. In Proc. the 2009 International Joint Conference on Neural Networks, June 2009, pp.302-307.Google Scholar
  12. [12]
    Joly A, Schnitzler F, Geurts P, Wehenkel L. L 1-based compression of random forest models. In Proc. European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, April 2012, pp.375-380.Google Scholar
  13. [13]
    Buciluă C, Caruana R, Niculescu-Mizil A. Model compression. In Proc. the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2006, pp.535-541.Google Scholar
  14. [14]
    Tikk D, Kóczy L T, Gedeon T D. A survey on universal approximation and its limits in soft computing techniques. International Journal of Approximate Reasoning, 2003, 33(2): 185-202.MathSciNetCrossRefzbMATHGoogle Scholar
  15. [15]
    Katajainen J, Mäkinen E. Tree compression and optimization with applications. International Journal of Foundations of Computer Science, 1990, 1(04): 425-447.MathSciNetCrossRefzbMATHGoogle Scholar
  16. [16]
    Chen S, Reif J H. Efficient lossless compression of trees and graphs. In Proc. the 6th Data Compression Conference, March 1996, pp.428.Google Scholar
  17. [17]
    Painsky A, Wornell G W. On the universality of the logistic loss function. arXiv:1805.03804, 2018. https://arxiv.org/pdf/1805.03804.pdf, September 2018.
  18. [18]
    Painsky A, Wornell G W. Bregman divergence bounds and the universality of the logarithmic loss. arXiv:1810.07014, 2018. http://export.arxiv.org/pdf/1810.07014, September 2018.
  19. [19]
    Hothorn T, Hornik K, Zeileis A. Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics, 2006, 15(3): 651-674.MathSciNetCrossRefGoogle Scholar
  20. [20]
    Painsky A, Rosset S. Cross-validated variable selection in tree-based methods improves predictive performance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(11): 2142-2153.CrossRefGoogle Scholar
  21. [21]
    Sayood K. Introduction to Data Compression (5th Edition). Morgan Kaufmann, 2017.Google Scholar
  22. [22]
    Szpankowski W, Weinberger M J. Minimax pointwise redundancy for memoryless models over large alphabets. IEEE Transactions on Information Theory, 2012, 58(7): 4094-4104.MathSciNetCrossRefzbMATHGoogle Scholar
  23. [23]
    Orlitsky A, Santhanam N P, Zhang J. Universal compression of memoryless sources over unknown alphabets. IEEE Transactions on Information Theory, 2004, 50(7): 1469-1481.MathSciNetCrossRefzbMATHGoogle Scholar
  24. [24]
    Painsky A, Rosset S, Feder M. Universal compression of memoryless sources over large alphabets via independent component analysis. In Proc. the 2015 Data Compression Conference, April 2015, pp.213-222.Google Scholar
  25. [25]
    Painsky A, Rosset S, Feder M. A simple and efficient approach for adaptive entropy coding over large alphabets. In Proc. the 2016 Data Compression Conference, March 2016, pp.369-378.Google Scholar
  26. [26]
    Painsky A, Rosset S, Feder M. Large alphabet source coding using independent component analysis. IEEE Transactions on Information Theory, 2017, 63(10): 6514-6529.MathSciNetCrossRefzbMATHGoogle Scholar
  27. [27]
    Painsky A, Rosset S, Feder M G. Linear independent component analysis over finite fields: Algorithms and bounds. IEEE Transactions on Signal Processing, 2018, 66(22): 5875-5886.MathSciNetCrossRefzbMATHGoogle Scholar
  28. [28]
    Zaks S. Lexicographic generation of ordered trees. Theoretical Computer Science, 1980, 10(1): 63-82.MathSciNetCrossRefzbMATHGoogle Scholar
  29. [29]
    Banerjee A, Merugu S, Dhillon I S, Ghosh J. Clustering with Bregman divergences. Journal of Machine Learning Research, 2005, 6: 1705-1749.MathSciNetzbMATHGoogle Scholar
  30. [30]
    Lloyd S. P. Least squares quantization in PCM. IEEE Transactions on Information Theory, 1982, 28(2): 129-137.MathSciNetCrossRefzbMATHGoogle Scholar
  31. [31]
    Cover T M, Thomas J A. Elements of Information Theory (2nd edition, e-book). John Wiley & Sons, 2012.Google Scholar
  32. [32]
    Deutsch L P. Gzip file format specification version 4.3. 1996. https://www.rfc-editor.org/rfc/rfc1952.txt, Oct. 2018.
  33. [33]
    Schuchman L. Dither signals and their effect on quantization noise. IEEE Transactions on Communication Technology, 1964, 12(4): 162-165.CrossRefGoogle Scholar
  34. [34]
    Geurts P, Ernst D, Wehenkel L. Extremely randomized trees. Machine Learning, 2006, 63(1): 3-42.CrossRefzbMATHGoogle Scholar
  35. [35]
    Liu F T, Ting K M, Yu Y, Zhou Z H. Spectrum of variable-random trees. Journal of Artificial Intelligence Research, 2008, 32: 355-384.CrossRefzbMATHGoogle Scholar
  36. [36]
    Zhou Z H, Feng J. Deep forest: Towards an alternative to deep neural networks. arXiv:1702.08835, 2017. https://arxiv.org/pdf/1702.08835v2.pdf, September 2018.

Copyright information

© Springer Science+Business Media, LLC & Science Press, China 2019

Authors and Affiliations

  1. 1.School of Computer Science and EngineeringThe Hebrew University of JerusalemJerusalemIsrael
  2. 2.Department of Statistics and Operations ResearchTel Aviv UniversityTel AvivIsrael

Personalised recommendations