WebTransc — A WWW Interface for Speech Corpora Production and Processing

Valenta, Tomáš; Šmídl, Luboš

doi:10.1007/978-3-319-23132-7_60

Tomáš Valenta⁷ &
Luboš Šmídl⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9319))

Included in the following conference series:

International Conference on Speech and Computer

1616 Accesses

Abstract

This paper describes a web application that was designed to prepare and process speech corpora, key data sources for automatic speech recognition (ASR), natural language processing (NLP), speech synthesis (TTS) and many other tasks. The application allows users to process the corpora with no other equipment than a web browser with internet connection. The application has been used, upgraded and improved for several years and its history is also described here. During that time, many valuable experiences with speech corpora processing have been gained and they are also mentioned as some good practices.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Barras, C., Geoffrois, E., Wu, Z., Liberman, M.: Transcriber: development and use of a tool for assisting speech corpora production. Speech Commun. 33(1–2), 5–22 (2001). http://www.sciencedirect.com/science/article/B6V1C-41SBGXX-2/2/6e7ee46d45ac6bc627f6ae738ca95461
Article MATH Google Scholar
Boersma, P.: Praat, a system for doing phonetics by computer. Glot Int. 5(9/10), 341–345 (2001)
Google Scholar
Burch, C.C., Dredze, M.: Creating speech and language data with Amazon’s Mechanical Turk. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, CSLDAMT 2010, pp. 1–12. Association for Computational Linguistics, Stroudsburg (2010). http://portal.acm.org/citation.cfm?id=1866697
Gr\(\mathring{\rm {u}}\)ber, M.: Acoustic Analysis of Czech Expressive Recordings from a Single Speaker in Terms of Various Communicative Functions. In: Proceedings of the 11th IEEE International Symposium on Signal Processing and Information Technology, pp. 267–272. IEEE, New York (2011). http://www.kky.zcu.cz/en/publications/GruberM_2011_AcousticAnalysisof
Müller, L., Psutka, J.V., Smídl, L.: Design of speech recognition engine. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2000. LNCS (LNAI), vol. 1902, pp. 259–264. Springer, Heidelberg (2000). http://link.springer.com/chapter/10.1007/3-540-45323-7_44
Chapter Google Scholar
Psutka, J., Müller, L., Matoušek, J., Radová, V.: Mluvíme s počítačem česky. Academia, Praha (2006)
Google Scholar
Valenta, T., Šmídl, L., Švec, J., Soutner, D.: Inter-annotator agreement on spontaneous Czech language. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS, vol. 8655, pp. 390–397. Springer, Heidelberg (2014). http://link.springer.com/10.1007/978-3-319-10816-2_47
Google Scholar
Šmídl, L.: Air Traffic Control Communication, LINDAT/CLARIN digital library at Institute of Formal and Applied Linguistics, Charles University in Prague (2011). http://hdl.handle.net/11858/00-097C-0000-0001-CCA1-0
Šmídl, L., Pražák, A.: OVM – Otázky Václava Moravce, LINDAT/CLARIN digital library at Institute of Formal and Applied Linguistics, Charles University in Prague (2013). http://hdl.handle.net/11858/00-097C-0000-000D-EC98-3
Šmídl, L., Psutka, J.: Comparison of keyword spotting methods for searching in speech. In: Interspeech 2006, pp. 1894–1897 (2006). http://www.kky.zcu.cz/en/publications/SmidlL_2006_Comparisonofkeyword
Švec, J., Hoidekr, J., Soutner, D., Vavruška, J.: Web text data mining for building large scale language modelling corpus. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS, vol. 6836, pp. 356–363. Springer, Heidelberg (2011). http://www.kky.zcu.cz/en/publications/JanSvec_2011_Webtextdatamining
Chapter Google Scholar
Švec, J., Šmídl, L.: Prototype of Czech spoken dialog system with mixed initiative for railway information service. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp. 568–575. Springer, Heidelberg (2010). http://dx.doi.org/10.1007/978-3-642-15760-8_72
Chapter Google Scholar

Download references

Acknowledgements

This research was supported by the Technology Agency of the Czech Republic, project No. TE01020197, and by the grant of the University of West Bohemia, project No. SGS-2013-032.

The data used in this paper are available in the LINDAT/Clarin repository [8, 9].

Author information

Authors and Affiliations

Department of Cybernetics, Faculty of Applied Sciences, New Technologies for Information Society, University of West Bohemia, Technická 8, 306 14, Plzeň, Czech Republic
Tomáš Valenta & Luboš Šmídl

Authors

Tomáš Valenta
View author publications
You can also search for this author in PubMed Google Scholar
Luboš Šmídl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tomáš Valenta .

Editor information

Editors and Affiliations

SPIIRAS, Saint-Petersburg, Russia
Andrey Ronzhin
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova
University of Patras, Patras, Greece
Nikos Fakotakis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Valenta, T., Šmídl, L. (2015). WebTransc — A WWW Interface for Speech Corpora Production and Processing. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds) Speech and Computer. SPECOM 2015. Lecture Notes in Computer Science(), vol 9319. Springer, Cham. https://doi.org/10.1007/978-3-319-23132-7_60

Download citation

DOI: https://doi.org/10.1007/978-3-319-23132-7_60
Published: 04 September 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23131-0
Online ISBN: 978-3-319-23132-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics