Segmenting and classifying activities in robot-assisted surgery with recurrent neural networks
- 280 Downloads
Automatically segmenting and classifying surgical activities is an important prerequisite to providing automated, targeted assessment and feedback during surgical training. Prior work has focused almost exclusively on recognizing gestures, or short, atomic units of activity such as pushing needle through tissue, whereas we also focus on recognizing higher-level maneuvers, such as suture throw. Maneuvers exhibit more complexity and variability than the gestures from which they are composed, however working at this granularity has the benefit of being consistent with existing training curricula.
Prior work has focused on hidden Markov model and conditional-random-field-based methods, which typically leverage unary terms that are local in time and linear in model parameters. Because maneuvers are governed by long-term, nonlinear dynamics, we argue that the more expressive unary terms offered by recurrent neural networks (RNNs) are better suited for this task. Four RNN architectures are compared for recognizing activities from kinematics: simple RNNs, long short-term memory, gated recurrent units, and mixed history RNNs. We report performance in terms of error rate and edit distance, and we use a functional analysis-of-variance framework to assess hyperparameter sensitivity for each architecture.
We obtain state-of-the-art performance for both maneuver recognition from kinematics (4 maneuvers; error rate of \(8.6 \pm 3.4\%\); normalized edit distance of \(9.3 \pm 4.3\%\)) and gesture recognition from kinematics (10 gestures; error rate of \(15.2 \pm 6.0\%\); normalized edit distance of \(8.4 \pm 6.3\%\)).
Automated maneuver recognition is feasible with RNNs, an exciting result which offers the opportunity to provide targeted assessment and feedback at a higher level of granularity. In addition, we show that multiple hyperparameters are important for achieving good performance, and our hyperparameter analysis serves to aid future work in RNN-based activity recognition.
KeywordsRobot-assisted surgery Recurrent neural networks Gesture recognition Maneuver recognition Surgical activity recognition
This research was supported by NSF Grant OISE-1065092, “A US-Germany Research Collaboration on Systems for Computer-Integrated Healthcare,” and by a fellowship for modeling, simulation, and training from the Link Foundation (Grant No. 90078471).
Compliance with ethical standards
Conflicts of interest
They authors declare that they have no conflicts of interest.
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed consent was obtained from all individual participants included in the study.
- 4.Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(Feb):281–305Google Scholar
- 6.Cho K, van Merriënboer B, Gülçehre Ç, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: EMNLPGoogle Scholar
- 7.DiPietro R, Hager GD (2018) Unsupervised learning for surgical motion by learning to predict the future. In: International conference on medical image computing and computer-assisted interventionGoogle Scholar
- 9.DiPietro R, Rupprecht C, Navab N, Hager GD (2017) Analyzing and exploiting NARX recurrent neural networks for long-term dependencies. arXiv preprint arXiv:1702.07805
- 12.Gao Y, Vedula S, Lee GI, Lee MR, Khudanpur S, Hager GD (2016) Unsupervised surgical data alignment with application to automatic activity annotation. In: 2016 IEEE international conference on robotics and automation (ICRA)Google Scholar
- 13.Gao Y, Vedula SS, Reiley CE, Ahmidi N, Varadarajan B, Lin HC, Tao L, Zappella L, Bejar B, Yuh DD, Chen CCG, Vidal R, Khudanpur S, Hager GD (2014) Language of surgery: a surgical gesture dataset for human motion modeling. In: Modeling and monitoring of computer assisted interventions (M2CAI) 2014. Springer, BostonGoogle Scholar
- 15.Gers FA, Schmidhuber J (2000) Recurrent nets that time and count. In: Neural networks, IJCNN, vol 3Google Scholar
- 17.Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2015) LSTM: a search space odyssey. arXiv preprint arXiv:1503.04069
- 20.Hutter F, Hoos H, Leyton-Brown K (2014) An efficient approach for assessing hyperparameter importance. In: International conference on machine learning, pp 754–762Google Scholar
- 21.Jacobs DM, Poenaru D (eds) (2001) Surgical educators’ handbook. Association for Surgical Education, Los AngelesGoogle Scholar
- 22.Lafferty J, McCallum A, Pereira FC (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. Technical report, UPennGoogle Scholar
- 23.Lea C, Hager GD, Vidal R (2015) An improved model for segmentation and recognition of fine-grained activities with application to surgical training tasks. In: 2015 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1123–1129Google Scholar
- 24.Lea C, Vidal R, Hager GD (2016) Learning convolutional action primitives for fine-grained action recognition. In: 2016 IEEE international conference on robotics and automation (ICRA)Google Scholar
- 25.Lea C, Vidal R, Hager GD (2016) Learning convolutional action primitives from multimodal time series data. In: Proceedings of the IEEE international conference on robotics and automation—ICRAGoogle Scholar
- 26.Lea C, Vidal R, Reiter A, Hager GD (2016) Temporal convolutional networks: a unified approach to action segmentation. In: European conference on computer vision. Springer, pp 47–54Google Scholar
- 28.Liu D, Jiang T (2018) Deep reinforcement learning for surgical gesture segmentation and classification. In: International conference on medical image computing and computer-assisted interventionGoogle Scholar
- 29.Mavroudi E, Bhaskara D, Sefati S, Ali H, Vidal R (2018) End-to-end fine-grained action segmentation and recognition using conditional random field models and discriminative sparse coding. In: 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1558–1567Google Scholar
- 33.Sefati S, Cowan NJ, Vidal R (2015) Learning shared, discriminative dictionaries for surgical gesture segmentation and classification. In: Modeling and monitoring of computer assisted interventions (M2CAI) 2015. Springer, BerlinGoogle Scholar
- 34.Sutton C, McCallum A (2006) An introduction to conditional random fields for relational learning, vol 2. MIT Press, CambridgeGoogle Scholar
- 35.Tao L, Elhamifar E, Khudanpur S, Hager GD, Vidal R (2012) Sparse hidden Markov models for surgical gesture classification and skill evaluation. In: International conference on information processing in computer-assisted interventions. Springer, pp 167–177Google Scholar
- 39.Zaremba W, Sutskever I, Vinyals O (2014) Recurrent neural network regularization. arXiv preprint arXiv:1409.2329