Inexpensive benchtop training systems offer significant advantages to meet the increasing demand of training surgeons and gastroenterologists in flexible endoscopy. Established scoring systems exist, based on task duration and mistake evaluation. However, they require trained human raters, which limits broad and low-cost adoption. There is an unmet and important need to automate rating with machine learning.
We present a general and robust approach for recognizing training tasks from endoscopic training video, which consequently automates task duration computation. Our main technical novelty is to show the performance of state-of-the-art CNN-based approaches can be improved significantly with a novel semi-supervised learning approach, using both labelled and unlabelled videos. In the latter case, we assume only the task execution order is known a priori.
Two video datasets are presented: the first has 19 videos recorded in examination conditions, where the participants complete their tasks in predetermined order. The second has 17 h of videos recorded in self-assessment conditions, where participants complete one or more tasks in any order. For the first dataset, we obtain a mean task duration estimation error of 3.65 s, with a mean task duration of 159 s (\(2.3\%\) relative error). For the second dataset, we obtain a mean task duration estimation error of 3.67 s. We reduce an average of 5.63% in error to 3.67% thanks to our semi-supervised learning approach.
This work is the first significant step forward to automate rating of flexible endoscopy students using a low-cost benchtop trainer. Thanks to our semi-supervised learning approach, we can scale easily to much larger unlabelled training datasets. The approach can also be used for other phase recognition tasks.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Dergachyova O, Bouget D, Huaulmé A, Morandi X, Jannin P (2016) Automatic data-driven real-time segmentation and recognition of surgical workflow. Int J Comput Assist Radiol Surg 11(6):1081–1089
Donahue J, Hendricks LA, Rohrbach M, Venugopalan S, Guadarrama S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: CVPR. p 2625–2634
Doughty H, Damen D, Mayol-Cuevas W (2018) Who’s better? who’s best? pairwise deep ranking for skill determination. In: CVPR. p 6057–6066
Fawaz HI, Forestier G, Weber J, Idoumghar L, Muller P.A (2018) Evaluating surgical skills from kinematic data using convolutional neural networks. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. p 214–221
Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: CVPR
Habaz I, Perretta S, Okrainec A, Crespin O, Kwong A, Weiss E, Velden E, Guerriero L, Longo F, Mascagni P, Liu L, Jackson T, Swanstrom L, Shlomovitz E (2019) Adaptation of the fundamentals of laparoscopic surgery box for endoscopic simulation: performance evaluation of the first 100 participants. Surg Endosc 33:3444–3450
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR abs/1502.03167
Ji S, Xu W, Yang M, Yu K (2013) 3d convolutional neural networks for human action recognition. IEEE PAMI 35(1):221–231
Jin Y, Dou Q, Chen H, Yu L, Qin J, Fu C, Heng P (2018) Sv-rcnet: workflow recognition from surgical videos using recurrent convolutional network. IEEE Trans Med Imaging 37(5):1114–1126
Jing L, Tian Y (2019) Self-supervised visual feature learning with deep neural networks: A survey. CoRR. arXiv:1902.06162
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. In: ICLR
Loukas C (2018) Video content analysis of surgical procedures. Surg Endosc 32(2):553–568
Malpani A, Vedula SS, Chen CCG, Hager GD (2014) Pairwise comparison-based objective score for automated skill assessment of segments in a surgical task. In: IPCAI. p 138–147
Sharma Y, Bettadapura V, Plötz T, Hammerla N, Mellor S, McNaney R, Olivier P, Deshmukh S, McCaskie A, Essa I (2014) Video based assessment of osats using sequential motion textures. In: Proceedings M2CAI. Georgia Institute of Technology
Sharma Y, Plötz T, Hammerld N, Mellor S, McNaney R, Olivier P, Deshmukh S, McCaskie A, Essa I (2014) Automated surgical osats prediction from videos. In: 2014 IEEE 11th International Symposium on Biomedical Imaging (ISBI), p 461–464. IEEE
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems 27, Curran Associates, Inc, p 568–576
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: ICCV
Twinanda A, Shehata S, Mutter D, Marescaux J, De Mathelin M, Padoy N (2016) Endonet: A deep architecture for recognition tasks on laparoscopic videos. IEEE Trans Med Imaging. https://doi.org/10.1109/TMI.2016.2593957
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L (2018) Temporal segment networks for action recognition in videos. IEEE PAMI 41:2740–2755
Zia A, Sharma Y, Bettadapura V, Sarin E.L, Clements M.A, Essa I (2015) Automated assessment of surgical skills using frequency analysis. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. p 430–438
Conflict of interest
Dr. Shlomovitz holds a trademark for the BEST Box. All other co-authors declare that they have no conflict of interest. This study was funded by IRCAD France. This article does not contain any studies with human participants or animals performed by any of the authors. Informed consent was obtained from all individual participants included in the study.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
About this article
Cite this article
Bencteux, V., Saibro, G., Shlomovitz, E. et al. Automatic task recognition in a flexible endoscopy benchtop trainer with semi-supervised learning. Int J CARS (2020). https://doi.org/10.1007/s11548-020-02208-w
- Flexible endoscopy
- Benchtop simulator
- Phase recognition
- Semi-supervised learning