Automatic task recognition in a flexible endoscopy benchtop trainer with semi-supervised learning



Inexpensive benchtop training systems offer significant advantages to meet the increasing demand of training surgeons and gastroenterologists in flexible endoscopy. Established scoring systems exist, based on task duration and mistake evaluation. However, they require trained human raters, which limits broad and low-cost adoption. There is an unmet and important need to automate rating with machine learning.


We present a general and robust approach for recognizing training tasks from endoscopic training video, which consequently automates task duration computation. Our main technical novelty is to show the performance of state-of-the-art CNN-based approaches can be improved significantly with a novel semi-supervised learning approach, using both labelled and unlabelled videos. In the latter case, we assume only the task execution order is known a priori.


Two video datasets are presented: the first has 19 videos recorded in examination conditions, where the participants complete their tasks in predetermined order. The second has 17 h of videos recorded in self-assessment conditions, where participants complete one or more tasks in any order. For the first dataset, we obtain a mean task duration estimation error of 3.65 s, with a mean task duration of 159 s (\(2.3\%\) relative error). For the second dataset, we obtain a mean task duration estimation error of 3.67 s. We reduce an average of 5.63% in error to 3.67% thanks to our semi-supervised learning approach.


This work is the first significant step forward to automate rating of flexible endoscopy students using a low-cost benchtop trainer. Thanks to our semi-supervised learning approach, we can scale easily to much larger unlabelled training datasets. The approach can also be used for other phase recognition tasks.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4


  1. 1.

    Dergachyova O, Bouget D, Huaulmé A, Morandi X, Jannin P (2016) Automatic data-driven real-time segmentation and recognition of surgical workflow. Int J Comput Assist Radiol Surg 11(6):1081–1089

    Article  Google Scholar 

  2. 2.

    Donahue J, Hendricks LA, Rohrbach M, Venugopalan S, Guadarrama S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: CVPR. p 2625–2634

  3. 3.

    Doughty H, Damen D, Mayol-Cuevas W (2018) Who’s better? who’s best? pairwise deep ranking for skill determination. In: CVPR. p 6057–6066

  4. 4.

    Fawaz HI, Forestier G, Weber J, Idoumghar L, Muller P.A (2018) Evaluating surgical skills from kinematic data using convolutional neural networks. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. p 214–221

  5. 5.

    Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: CVPR

  6. 6.

    Habaz I, Perretta S, Okrainec A, Crespin O, Kwong A, Weiss E, Velden E, Guerriero L, Longo F, Mascagni P, Liu L, Jackson T, Swanstrom L, Shlomovitz E (2019) Adaptation of the fundamentals of laparoscopic surgery box for endoscopic simulation: performance evaluation of the first 100 participants. Surg Endosc 33:3444–3450

    Article  Google Scholar 

  7. 7.

    Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR abs/1502.03167

  8. 8.

    Ji S, Xu W, Yang M, Yu K (2013) 3d convolutional neural networks for human action recognition. IEEE PAMI 35(1):221–231

    Article  Google Scholar 

  9. 9.

    Jin Y, Dou Q, Chen H, Yu L, Qin J, Fu C, Heng P (2018) Sv-rcnet: workflow recognition from surgical videos using recurrent convolutional network. IEEE Trans Med Imaging 37(5):1114–1126

    Article  Google Scholar 

  10. 10.

    Jing L, Tian Y (2019) Self-supervised visual feature learning with deep neural networks: A survey. CoRR. arXiv:1902.06162

  11. 11.

    Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. In: ICLR

  12. 12.

    Loukas C (2018) Video content analysis of surgical procedures. Surg Endosc 32(2):553–568

    Article  Google Scholar 

  13. 13.

    Malpani A, Vedula SS, Chen CCG, Hager GD (2014) Pairwise comparison-based objective score for automated skill assessment of segments in a surgical task. In: IPCAI. p 138–147

  14. 14.

    Sharma Y, Bettadapura V, Plötz T, Hammerla N, Mellor S, McNaney R, Olivier P, Deshmukh S, McCaskie A, Essa I (2014) Video based assessment of osats using sequential motion textures. In: Proceedings M2CAI. Georgia Institute of Technology

  15. 15.

    Sharma Y, Plötz T, Hammerld N, Mellor S, McNaney R, Olivier P, Deshmukh S, McCaskie A, Essa I (2014) Automated surgical osats prediction from videos. In: 2014 IEEE 11th International Symposium on Biomedical Imaging (ISBI), p 461–464. IEEE

  16. 16.

    Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems 27, Curran Associates, Inc, p 568–576

  17. 17.

    Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: ICCV

  18. 18.

    Twinanda A, Shehata S, Mutter D, Marescaux J, De Mathelin M, Padoy N (2016) Endonet: A deep architecture for recognition tasks on laparoscopic videos. IEEE Trans Med Imaging.

    Article  PubMed  Google Scholar 

  19. 19.

    Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L (2018) Temporal segment networks for action recognition in videos. IEEE PAMI 41:2740–2755

    Article  Google Scholar 

  20. 20.

    Zia A, Sharma Y, Bettadapura V, Sarin E.L, Clements M.A, Essa I (2015) Automated assessment of surgical skills using frequency analysis. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. p 430–438

Download references

Author information



Corresponding author

Correspondence to Valentin Bencteux.

Ethics declarations

Conflict of interest

Dr. Shlomovitz holds a trademark for the BEST Box. All other co-authors declare that they have no conflict of interest. This study was funded by IRCAD France. This article does not contain any studies with human participants or animals performed by any of the authors. Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 45262 KB)

Supplementary material 2 (mp4 17887 KB)

Supplementary material 3 (pdf 137 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bencteux, V., Saibro, G., Shlomovitz, E. et al. Automatic task recognition in a flexible endoscopy benchtop trainer with semi-supervised learning. Int J CARS (2020).

Download citation


  • Flexible endoscopy
  • Education
  • Skill
  • Benchtop simulator
  • Phase recognition
  • Semi-supervised learning