Towards Customized Automatic Segmentation of Subtitles

  • Aitor Álvarez
  • Haritz Arzelus
  • Thierry Etchegoyhen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8854)


Automatic subtitling through speech recognition technology has become an important topic in recent years, where the effort has mostly centered on improving core speech technology to obtain better recognition results. However, subtitling quality also depends on other parameters aimed at favoring the readability and quick understanding of subtitles, like correct subtitle line segmentation. In this work, we present an approach to automate the segmentation of subtitles through machine learning techniques, allowing the creation of customized models adapted to the specific segmentation rules of subtitling companies. Support Vector Machines and Logistic Regression classifiers were trained over a reference corpus of subtitles manually created by professionals and used to segment the output of speech recognition engines. We describe the performance of both classifiers and discuss the merits of the approach for the automatic segmentation of subtitles.


automatic subtitling subtitle segmentation machine learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    AENOR: Spanish Technical Standards. Standard UNE 153010:2003: Subtitled Through Teletext,
  2. 2.
    Álvarez, A., del Pozo, A., Arruti, A.: APyCA: Towards the Automatic Subtitling of Television Content in Spanish. In: Proceedings of IMCSIT, pp. 567–574. IEEE, Wisla (2010)Google Scholar
  3. 3.
    Álvarez, A., Ruiz, P., Arzelus, H.: Improving a Long Audio Aligner through Phone-Relatedness Matrices for English, Spanish and Basque. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS (LNAI), vol. 8655, pp. 473–480. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  4. 4.
    Baldridge, J.: The OpenNLP Project (2005),
  5. 5.
    Baldridge, J.: Stanford Parser 1.6 (2007),
  6. 6.
    Bordel, G., Peñagarikano, M., Rodríguez-Fuentes, L.J., Varona, A.: A Simple and Efficient Method to Align Very Long Speech Signals to Acoustically Imperfect Transcriptions. In: Proceedings of INTERSPEECH, Portland (2012)Google Scholar
  7. 7.
    Chang, C.C., Lin, C.J.: Libsvm: A Library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2(3), 27:1–27:27 (2011)Google Scholar
  8. 8.
    Coltheart, M.: What Would We Read Best? Attention and Performance II: The Psychology of Reading. Lawrence Erlbaum Associates, London (1987)Google Scholar
  9. 9.
    D’Arcais, F., Giovanni, B.: Syntactic Processing during Reading for Comprehension. Attention and Performance II: The Psychology of Reading, pp. 619–633. Lawrence Erlbaum Associates, London (1987)Google Scholar
  10. 10.
    Díaz-Cintas, J., Orero, P., Remael, A.: Media for All: Subtitling for the Deaf, Audio Description, and Sign Language, vol. 30. Rodopi (2007)Google Scholar
  11. 11.
    D’Ydewalle, G., Rensbergen, J.V.: Developmental Studies of Text-Picture Interactions in the Perception of Animated Cartoons with Text. Advances in Psychology, vol. 58, pp. 233–248. Elsevier, Amsterdam (1989)Google Scholar
  12. 12.
    Ezeiza, N., Alegria, I., Arriola, J.M., Urizar, R., Aduriz, I.: Combining Stochastic and Rule-based Methods for Disambiguation in Agglutinative Languages. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 380–384, Montreal (1998)Google Scholar
  13. 13.
  14. 14.
    Heafield, K.: KenLM: Faster and Smaller Language Model Queries. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 187–197, Edinburgh (2011)Google Scholar
  15. 15.
    Karamitroglou, F.: A Proposed Set of Subtitling Standards in Europe. Translation Journal 2(2), 1–15 (1998)Google Scholar
  16. 16.
    Kneser, R., Ney, H.: Improved Backing-off for n-gram Language Modeling. In: Proceedings of ICASSP, pp. 181–184, Detroit (1995)Google Scholar
  17. 17.
    Neto, J., Meinedo, H., Viveiros, M., Cassaca, R., Martins, C., Caseiro, D.: Broadcast News Subtitling System in Portuguese. In: Proceedings of ICASSP, pp. 1561–1564, Las Vegas (2008)Google Scholar
  18. 18.
    Padró, L.: Stanilovsky. E.: FreeLing 3.0: Towards Wider Multilinguality. In: Proceedings of the 8th Language Resources and Evaluation Conference, Istanbul (2012)Google Scholar
  19. 19.
    Pedregosa, F., et al.: Scikit-learn: Machine Learning in Python. The Journal of Machine Learning Research 12, 2825–2830 (2011)zbMATHMathSciNetGoogle Scholar
  20. 20.
    Perego, E.: Subtitles and line-breaks: Towards improved readability. In: Between Text and Image: Updating Research in Screen Translation, vol. 78, pp. 211–223. John Benjamins Publishing (2008)Google Scholar
  21. 21.
    Perego, E., Del Missier, F., Porta, M., Mosconi, M.: The Cognitive Effectiveness of Subtitle Processing. Media Psychology 13(3), 243–272 (2010)CrossRefGoogle Scholar
  22. 22.
    Petrov, S., Klein, D.: Improved Inference for Unlexicalized Parsing. In: Proceedings of HLT-NAACL, pp. 404–411, Rochester (2007)Google Scholar
  23. 23.
    Rajendran, D.J., Duchowski, A.T., Orero, P., Martínez, J., Romero-Fresco, P.: Effects of Text Chunking on Subtitling: A Quantitative and Qualitative Examination. Perspectives 21(1), 5–21 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Aitor Álvarez
    • 1
  • Haritz Arzelus
    • 1
  • Thierry Etchegoyhen
    • 1
  1. 1.Human Speech and Language TechnologiesVicomtech-IK4San SebastiánSpain

Personalised recommendations