International Journal of Computer Vision

, Volume 122, Issue 2, pp 292–312 | Cite as

Growing Regression Tree Forests by Classification for Continuous Object Pose Estimation

Article
  • 482 Downloads

Abstract

We propose a novel node splitting method for regression trees and incorporate it into the random regression forest framework. Unlike traditional binary splitting, where the splitting rule is selected from a predefined set of binary splitting rules via trial-and-error, the proposed node splitting method first finds clusters in the training data which at least locally minimize the empirical loss without considering the input space. Then splitting rules which preserve the found clusters as much as possible, are determined by casting the problem as a classification problem. Consequently, our new node splitting method enjoys more freedom in choosing the splitting rules, resulting in more efficient tree structures. In addition to the algorithm for the ordinary Euclidean target space, we present a variant which can naturally deal with a circular target space by the proper use of circular statistics. In order to deal with challenging, ambiguous image-based pose estimation problems, we also present a voting-based ensemble method using the mean shift algorithm. Furthermore, to address data imbalanceness problems present in some of the datasets, we propose a bootstrap sampling method using a sample weighting technique. We apply the proposed random regression forest algorithm to head pose estimation, car direction estimation and pedestrian orientation estimation tasks, and demonstrate its competitive performance.

Keywords

Pose estimation Direction estimation Orientation estimation Regression tree Random regression forest Decision tree Mean shift 

Notes

Acknowledgments

This research was supported by a MURI Grant from the US Office of Naval Research under N00014-10-1-0934.

References

  1. Andriluka, M., Roth, S., & Schiele, B. (2010). Monocular 3D pose estimation and tracking by detection. In CVPR 2010: IEEE conference on computer vision and pattern recognition.Google Scholar
  2. Bailly, K., Milgram, M., & Phothisane, P. (2009). Head pose estimation by a stepwise nonlinear regression. In International conference on computer analysis of images and patterns.Google Scholar
  3. Baltieri, D., Vezzani, R., & Cucchiara, R. (2012). People orientation recognition by mixtures of wrapped distributions on random trees. In European conference on computer vision. Heidelberg: Springer.Google Scholar
  4. Berzal, F., Cubero, J. C., Marn, N., & Sánchez, D. (2004). Building multi-way decision trees with numerical attributes. Information Sciences, 165(1–2), 73–90.MathSciNetCrossRefMATHGoogle Scholar
  5. Bissacco, A., Yang, M. H., & Soatto, S. (2007). Fast human pose estimation using appearance and motion via multi-dimensional boosting regression. In 2007 IEEE conference on computer vision and pattern recognition.Google Scholar
  6. Breiman, L. (2001). Random forest. Machine Learning, 45(1), 5–32.CrossRefMATHGoogle Scholar
  7. Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and regression trees. London: Chapman and Hall/CRC.MATHGoogle Scholar
  8. Cao, X., Wei, Y., Wen, F., & Sun, J. (2012). Face alignment by explicit shape regression. In IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  9. Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 27.CrossRefGoogle Scholar
  10. Chang-Chien, S. J., Hung, W. L., & Yang, M. S. (2012). On mean shift-based clustering for circular data. Soft Computing, 16(6), 1043–1060.CrossRefGoogle Scholar
  11. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique nitesh. Journal of Artificial Intelligence Research, 16, 321–357.MATHGoogle Scholar
  12. Chen, C., Liaw, A., & Breiman, L. (2004). Using random forest to learn imbalanced data. UC Berkeley: Technical report, Department of Statistics.Google Scholar
  13. Chen, C., Heili, A., & Odobez, J. M. (2011). Combined estimation of location and body pose in surveillance video. In International conference on advanced video and signal based surveillance (AVSS) Google Scholar
  14. Cheng, Y. (1995). Mean shift, mode seeking, and clustering. PAMI, 17(8), 790–799.CrossRefGoogle Scholar
  15. Chou, P. A. (1991). Optimal partitioning for classification and regression trees. PAMI, 13(4), 340–354.CrossRefGoogle Scholar
  16. Comaniciu, D., & Meer, P. (2002). Mean shift: A robust approach toward feature space analysis. PAMI, 24(5), 603–619.CrossRefGoogle Scholar
  17. Criminisi, A., & Shotton, J. (2013). Decision forests for computer vision and medical image analysis. New York: Springer.CrossRefGoogle Scholar
  18. Criminisi, A., Shotton, J., Robertson, D., & Konukoglu, E. (2010). Regression forests for efficient anatomy detection and localization in CT studies. In Medical computer vision. Recognition techniques and applications in medical imaging (Vol. 6533, pp. 106–117).Google Scholar
  19. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05).Google Scholar
  20. Dantone, M., Gall, J., Fanelli, G., & Gool, L. V. (2012). Real-time facial feature detection using conditional regression forests. In 2012 IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  21. Dobra, A., & Gehrke, J. (2002). Secret: A scalable linear regression tree algorithm. In Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining.Google Scholar
  22. Dollár, P., Welinder, P., & Perona, P. (2010). Cascaded pose regression. In 2010 IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  23. Domingos, P. (1999). MetaCost: A general method for making classifiers cost-sensitive. In Proceedings of the 5th ACM SIGKDD international conference on Knowledge discovery and data mining.Google Scholar
  24. Drucker, H., Burges, C. J. C., Kaufman, L., Smola, A., & Vapnik, V. (1996). Support vector regression machines. In Advances in neural information processing systems NIPS Google Scholar
  25. Drummond, C., & Holte, R. C. (2003). C4.5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling. In: ICML workshop on learning from imbalanced datasets II.Google Scholar
  26. Duin, R. P. W. (1976). On the choice of smoothing parameters for parzen estimators of probability density functions. IEEE Transactions on Computers, C–25(11), 1175–1179.CrossRefMATHGoogle Scholar
  27. Enzweiler, M., & Gavrila, D. M. (2010). Integrated pedestrian classification and orientation estimation. In CVPR 2010: IEEE conference on computer vision and pattern recognition Google Scholar
  28. Fan, R. E., Chang, K. W., Hsieh, C. J., Wang, X. R., & Lin, C. J. (2008). LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874.MATHGoogle Scholar
  29. Fanelli, G., Gall, J., & Gool, L. V. (2011). Real time head pose estimation with random regression forests. In 2011 IEEE conference on computer vision and pattern recognition (CVPR) Google Scholar
  30. Fayyad, U. M., Irani, & K. B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the international joint conference on uncertainty in AI Google Scholar
  31. Fenzi, M., & Ostermann, J. (2014). Embedding geometry in generative models for pose estimation of object categories. In British machine vision conference.Google Scholar
  32. Fenzi, M., Leal-taixé, L., Rosenhahn, B., & Ostermann, J. (2013). Class generative models based on feature regression for pose estimation of object categories. In Proceedings of the IEEE conference on computer vision and pattern recognition.Google Scholar
  33. Fenzi, M., Leal-taixé, L., Ostermann, J., & Tuytelaars, T. (2015). Continuous pose estimation with a spatial ensemble of fisher regressors. In Proceedings of the IEEE international conference on computer vision (ICCV).Google Scholar
  34. Fisher, N. I. (1996). Statistical analysis of circular data. Cambridge: Cambridge University Press.Google Scholar
  35. Fukunaga, K., & Hostetler, L. D. (1975). The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Transactions on Information Theory, 21(1), 32–40.MathSciNetCrossRefMATHGoogle Scholar
  36. Gaile, G. L., & Burt, J. E. (1980). Directional statistics (concepts and techniques in modern geography). Norwich: Geo Abstracts Ltd.Google Scholar
  37. Gall, J., & Lempitsky, V. (2009). Class-specific hough forests for object detection. In IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  38. Gandhi, T., & Trivedi, M. M. (2008). Image based estimation of pedestrian orientation for improving path prediction. In Intelligent vehicles symposium.Google Scholar
  39. Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42.CrossRefMATHGoogle Scholar
  40. Girshick, R., Shotton, J., Kohli, P., Criminisi, A., & Fitzgibbon, A. (2011). Efficient regression of general-activity human poses from depth images. In 2011 IEEE international conference on computer vision (ICCV).Google Scholar
  41. Goto, K., Kidono, K., Kimura, Y., & Naito, T. (2011). Pedestrian detection and direction estimation by cascade detector with multi-classifiers utilizing feature interaction descriptor. In IEEE intelligent vehicles symposium (IV).Google Scholar
  42. Gourier, N., Hall, D., & Crowley, J. L. (2004). Estimating face orientation from robust detection of salient facial structures. In ICPR international workshop on visual observation of deictic gestures.Google Scholar
  43. Habbema, J. D. F., & Hermans, J. (1977). Selection of variables in discriminant analysis by F-statistic and error rate. Technometrics, 19(4), 487–493.CrossRefMATHGoogle Scholar
  44. Haj, M. A., Gonzalez, J., & Davis, L. S. (2012). On partial least squares in head pose estimation: How to simultaneously deal with misalignment. In 2012 IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  45. Hara, K., & Chellappa, R. (2013). Computationally efficient regression on a dependency graph for human pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition.Google Scholar
  46. Hara, K., & Chellappa, R. (2014). Growing regression forests by classification: Applications to object pose estimation. In The European conference on computer vision (ECCV).Google Scholar
  47. He, K., Sigal, L., & Sclaroff, S. (2014). Parameterizing object detectors in the continuous pose space. In The European conference on computer vision (ECCV).Google Scholar
  48. Herdtweck, C., & Curio, C. (2013). Monocular car viewpoint estimation with circular regression forests. In Intelligent vehicles symposium (IVS) Google Scholar
  49. Ho, H. T., & Chellappa, R. (2012). Automatic head pose estimation using randomly projected dense SIFT descriptors. In 2012 19th IEEE international conference on image processing.Google Scholar
  50. Huang, C., Ding, X., & Fang, C. (2010). Head pose estimation based on random forests for multiclass classification. In 2010 20th International conference on pattern recognition (ICPR).Google Scholar
  51. Kafai, M., Miao, Y., & Okada, K. (2010). Directional mean shift and its application for topology classification of local 3D structures. In CVPR workshop.Google Scholar
  52. Kashyap, R. L. (1977). A Bayesian comparison of different classes of dynamic models using empirical data. IEEE Transactions on Automatic Control, 22(5), 715–727.MathSciNetCrossRefMATHGoogle Scholar
  53. Kobayashi, T., & Otsu, N. (2010). Von mises-fisher mean shift for clustering on a hypersphere. In 2010 20th International conference on pattern recognition (ICPR).Google Scholar
  54. Kubat, M., Holte, R., & Matwin, S. (1997). Learning when negative examples abount. In Proceedings of ECML-97, 10th European conference on machine learning.Google Scholar
  55. Loh, W. Y., & Vanichsetakul, N. (1988). Tree-structured classification via generalized discriminant analysis. Journal of the American Statistical Association, 83(403), 715–725.MathSciNetCrossRefMATHGoogle Scholar
  56. Mardia, K. V., & Jupp, P. (2000). Directional statistics (2nd ed.). New York: Wiley.MATHGoogle Scholar
  57. Nakajima, C., Pontil, M., Heisele, B., & Poggio, T. (2003). Full-body person recognition system. Pattern Recognition, 36(9), 1997–2006.CrossRefMATHGoogle Scholar
  58. Orozco, J., Gong, S., & Xiang, T. (2009). Head pose classification in crowded scenes. In Procedings of the British machine vision conference (BMVC 2009).Google Scholar
  59. Ozuysal, M., Lepetit, V., & Fua, P. (2009). Pose estimation for category specific multiview object localization. In 2009 IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  60. Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., & Brunk, C. (1994). Reducing misclassification costs. In Proceedings of the 11th international conference on machine learning.Google Scholar
  61. Pelleg, D., & Moore, A. (2000). X-means: Extending K-means with efficient estimation of the number of clusters. In Proceedings of the 17th international conference on machine learning.Google Scholar
  62. Redondo-cabrera, C., Lopez-Sastre, R., & Tuytelaars, T. (2014). All together now : Simultaneous object detection and continuous pose estimation using a hough forest with probabilistic locally enhanced voting. In 25th British machine vision conference—BMVC.Google Scholar
  63. Rosipal, R., & Trejo, L. J. (2001). Kernel partial least squares regression in reproducing kernel hilbert space. JMLR, 2, 97–123.MATHGoogle Scholar
  64. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.MathSciNetCrossRefMATHGoogle Scholar
  65. Shimizu, H., & Poggio, T. (2004). Direction estimation of pedestrian from multiple still images. In Intelligent vehicles symposium (IVS).Google Scholar
  66. Sun, M., Kohli, P., & Shotton, J. (2012). Conditional regression forests for human pose estimation. In 2012 IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  67. Tao, J., & Klette, R. (2013). Integrated pedestrian and direction classification using a random decision forest. In ICCV Workshop.Google Scholar
  68. Torgo, L., & Gama, J. (1996). Regression by classification. In Brazilian symposium on artificial intelligence.Google Scholar
  69. Torgo, L., Ribeiro, R. P., Pfahringer, B., & Branc, P. (2013). SMOTE for regression. In Portuguese conference on artificial intelligence.Google Scholar
  70. Torki, M., Elgammal, A. (2011). Regression from local features for viewpoint and pose estimation. In 2011 International conference on computer vision.Google Scholar
  71. Vapnik, V. (1998). Statistical learning theory. New York: Wiley.MATHGoogle Scholar
  72. Weiss, S. M., & Indurkhya, N. (1995). Rule-based machine learning methods for functional prediction. Journal of Artificial Intelligence Research, 3, 383–403.MATHGoogle Scholar
  73. Wu, K. L., & Yang, M. S. (2007). Mean shift-based clustering. Pattern Recognition, 40(11), 3035–3052.Google Scholar
  74. Yan, Y., Ricci, E., Subramanian, R., Lanz, O., & Sebe, N. (2013). No matter where you are: Flexible graph-guided multi-task learningfor multi-view head pose classification under target motion. In Proceedings of the IEEE international conference on computer vision.Google Scholar
  75. Yang, L., Liu, J., & Tang, X. (2014). Object detection and viewpoint estimation with auto-masking neural network. In European conference on computer vision.Google Scholar
  76. Zhang, H., El-gaaly, T., Elgammal, A., & Jiang, Z. (2013). Joint object and pose recognition using homeomorphic manifold analysis. In Association for the advancement of artificial intelligence (AAAI).Google Scholar
  77. Zhao, G., Takafumi, M., Shoji, K., & Kenji, M. (2012). Video based estimation of pedestrian walking direction for pedestrian protection system. Journal of Electronics (China), 29(1–2), 72–81.CrossRefGoogle Scholar
  78. Zhao, G., Takafumi, M., Shoji, K., & Kenji, M. (2012). Video based estimation of pedestrian walking direction for pedestrian protection system. Journal of Electronics (China), 29(1–2), 72–81.CrossRefGoogle Scholar
  79. Zhen, X., Wang, Z., Yu, M., & Li, S. (2015). Supervised descriptor learning for multi-output regression. In Proceedings of the IEEE conference on computer vision and pattern recognition.Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Center for Automation Research, UMIACSUniversity of MarylandCollege ParkUSA

Personalised recommendations